Introduction

Livestock production in India is subsidiary to plant agriculture. In tropical countries, ruminants are fed on lignocellulosic by-products like cereal straws, tree foliages and cakes of oil seeds. Ruminants digest such plant materials by microbial processes. The rumen is characterized by its high microbial population density, high diversity and complexity of interactions. Bacteria are predominate in the rumen, along with a variety of anaerobic protozoa, archaea and fungi [1] and the associated occurrence of bacteriophage is also well documented [2]. The use of small subunit (SSU) rRNA gene sequence analysis has allowed a more complete description of the rumen microbiome and these inventories have demonstrated that a large microbial component remains uncharacterized and uncultured [3], especially a high proportion of the fibrolytic bacteria [4]. The rumen habitat contains a consortium of microorganisms that harbour the complex lignocellulosic degradation system for the microbial attachment and digestion of plant biomass. However, the complex chemical processes required to break down the plant cell wall are rarely carried out by a single species. Evidence also suggests that the most important organisms and gene sets involved in the most efficient hydrolysis of plant cell wall are associated with the fiber portion of the rumen digesta [5].

The bovine rumen provides a unique genetic resource for the discovery of plant cell wall-degrading microbial enzymes for use in biofuel production, presumably because of co-evolution of microbes and plant cell wall types [6]. However, there are limitations to metagenome mining [7], and the number of clones needed to represent the entire metagenome is staggering [8]. Nonetheless, this approach does allow one to begin to harvest the remarkable and vast diversity present in a given metagenome [9].

The sequencing of genomes from several hundred bacterial and numerous eukaryotic species have laid the foundation for generating genomic sequence data from whole environments avoiding a culturing step. This approach, also known as ‘‘metagenomics’’, is defined as the genomic analysis by direct extraction and cloning of DNA from an assemblage of microorganisms [10]. One approach has been the use of the pyrosequence technology to increase the depth of SSU rRNA surveys by sequencing amplicons from the variable region of the SSU molecule [1113]. The second approach uses random sample pyrosequencing to generate environmental gene tags (EGTs) and protein families [14] from microbiomes to highlight significant differences in metabolic potential in each environment [15]. Moreover, this massive depth metagenomic sequencing is an invaluable complement to what has already been learned about lignocellulose degradation in the rumen. With the help of this technology, buffalo rumen (Bubalus bubalis) can be studied for diet and management dependent changes through metagenomic investigations on the microbial community and metabolic potential. Data presented in this study is a comparative metagenome analysis of the Surti buffalo microbiome under different diet composition using the inexpensive, massively parallel, and rapid method of pyrosequencing.

Materials and methods

Rumen sampling

The permission of the Committee for the Purpose of Control and Supervision of Experiments on Animals (CPCSEA) was obtained prior to initiation of the study. Eight Surti buffaloes, 2–3 years of age and weighing approximately 200 kg, were divided into four groups of two animals and were housed at the Department of Animal Nutrition, College of Veterinary Science and Animal Husbandry, Anand. Each group was given different treatment in form of ad lib total mixed ration (TMR) namely T1, T2, T3 and T4. Treatments consisted of a roughage to concentrate (R:C) of 100:0, 75:25, 50:50, and 25:75, respectively in individual feeding stalls. Concentrates were high-quality, low-fiber feeds that contained a high concentration of digestible energy per unit weight and volume. Under this study, the concentrate diet consisted of 20.11% crude protein, 10.28% crude fibre, 3.80% ether extract, 52.43% nitrogen free extract, 13.38% ash, 3.38% silica, 1.00% phosphorus and 1.22% calcium. All Animals were given free access for 2 h morning and evening, during which they had free access to drinking water. The feeding experiment was conducted for a period of 30 days. Sample of the rumen liquor (500 ml) was collected separately from individual animal on 30th day of experiment after 2 h of morning feeding by a suction pump using a flexible stomach tube as described earlier by Khampa et al. [16]. About 100 ml rumen fluid was passed through four layers of cheese cloth to remove particulate matter. Remaining rumen fluid was stored at −80°C for further study. Total genomic DNA was extracted from four groups following a protocol similar to the extraction of high molecular weight DNA with little modification (enzyme–chemical–glass bead lysis method) Liles et al. [17] and Qiagen stool kit protocol. DNA purity and concentration was analyzed by spectrophotometric quantification and gel electrophoresis.

Pyrosequencing and sequence analysis

Four samples were subjected to pyrosequencing (454 Life Sciences technology) based high throughput sequencer (GS FLX, Roche) as manufacture’s protocol. In brief, the samples were nebulised to generate fragments of size 500–800 bp. The sequencing library was prepared by applying adapters (A: 5′-CCATCTCATCCCTGCGTGTCCCATCTGTTCCCTCCCTGTCTCAG-3′ and B:5′CCTATCCCCTGTGTGCCTTGCCTATCCCCTGTTGCGTGTCTCAG-3′) on both ends of the fragments as described by manufacturer. Emulsion PCR was carried out to clonally amplify fragments on sequencing beads, followed by its recovery and loading onto Pico titre plate along with enzyme beads. Pyrosequencing was carried out for 200 cycles with the flow of A, T, G and C nucleotides sequentially and image was captured. Captured images were processed by image processing software to get sequencing reads. The data were analyzed using the SEED annotation engine (http://seed.sdsu.edu/FIG/index.cgi) [18] under metagenome rapid annotation using subsystem technology (MG-RAST) server. The sequences were compared using the BLASTX algorithm with an expected cut off of 1 × 10−5 [19]. Ribosomal Database Project (RDP) was used for robust bacterial classification and the European Ribosomal RNA database was used to classify eukaryotic and archaeal sequences. The sequences used in this article are freely available from the SEED platform (MG-RAST). The IDs used in this study are: 4445089.3, 4445091.3, 4445093.3 and 4446901.3.

Results and discussion

The overall goal of this study was to obtain a detailed characterization of the rumen microbiome of Surti buffalo in four different feeding regimes, with respect to both phylotype (ribosomal DNA gene tags) and functional content (environmental gene tags; EGTs). Dissimilar relationships were observed for EGTs hits against the SEED for Bacterial, Archaeal and Eukaryia in all four groups due to unequal distribution of contigs numbers (experimental bias). The numbers of SSU rRNA hits in the Surti buffalo metagenomic libraries were very rare. It was 0.02% in T1, 0.04% in T2, 0.13% in T3 and 0.21% in T4 for bacteria, respectively (Table 1). Although few SSU rRNA hits has been also reported by Brulc et al. [20] in cattle rumen microbiome. Unfortunately, no SSU rRNA hits were observed in Archaea (European Ribosomal Database) and Eukarya (European Ribosomal Database) in our microbiomes. Although three sequences (reads) of T2 group (0.0295%) were hit with SSU rRNA database of other than bacteria, archaea and eukarya (Table 1). These reads may belong to the host animal or sampling bias.

Table 1 Summary of pyrosequencing data obtained from four different Surti buffalo (Bubalus bubalis) rumen samples

In theory, pyrosequencing should randomly sample the whole metagenome. From our analysis, there does not appear to be a bias with respect to location on the 16S rRNA molecule for the sequences retrieved from these four microbiome samples (Table 1). The phylogenetic composition of bacterial phyla for EGTs followed an unequal distribution for metagenomic libraries (Fig. 1), again highlighting the randomness of the sequenced libraries, as expected for a random sampling of genes. Additionally, our previous report based on Sanger sequencing [21] showed that phylogenetic composition of Surti buffalo rumen fed green fodder Napier bajra 21 (Pennisetum purpureum), mature pasture grass (Dicanthium annulatum), and concentrate mixture by partial 16S rRNA gene libraries were mostly belonged to Firmicutes, also known as the low G + C group and an unidentified group.

Fig. 1
figure 1

Phylogenic composition of bacterial phyla from four pyrosequenced environmental gene tags (EGTs). The percent of sequences in each of the bacterial phyla from A (T1, 100% roughage), B (T2, 75% roughage), C (T3, 50% roughage) and D (T4, 25% roughage) microbiomes are shown. The BLASTX cutoff of EGTs is 1 × 10−5 with a minimum length of 50 bp

Further insights into the diversity within the four different treatment groups of buffalo rumen metagenomic samples were obtained by comparing the number of EGTs (E value <1 × 10−5) of different bacterial phyla (Fig. 1). Sequence length is one of the primary factors in assessing similarity between sequences, and BLAST E values are dependent on both the length of the query sequence and the length of the database to which they are being compared [22]. However, this will affect the number of significant sequences found in the searches by a factor of two or more [23]. Pyrosequencing yielded more sequence than comparable Sanger sequencing, more than compensating for these missing sequences. The sequences missed in our searches are expected to be randomly distributed, and therefore, we are not expected to skew the comparative analysis. Indeed, while classifying EGTs from short pyrosequencing reads has been challenging, a report demonstrates that EGTs as short as 27 amino acids can accurately be classified with an average specificity ranging from 97% for Superkingdom to 93% for Order [24].

Bacterial specific EGTs represented approximately 70.73% in T1, 83.68% in T2, 79.24% in T3 and 64.88% in T4 of the total EGTs (Table 1) and the distribution of phylotypes fell predominantly into the bacteroidetes/Chlorobi, firmicutes and proteobacteria groups, regardless of the microbiome analyzed (Fig. 1). The distribution of EGTs from the bacteria is congruent with the distribution of SSU rRNA phylotypes, as was found with the Soudan Mine, cattle rumen and chicken ceacum microbiome studies [19, 20, 25]. Similar observations were also found in our previous study in chicken gut microbiome and goat rumen microbiome (data unpublished) by high throughput sequencing. Archaean EGTs constituted approximately 0.5%, 1.69%, 1.34% and 1.48% in T1, T2, T3, and T4 of the total EGTs, respectively (Table 1), matching well with previous estimates of Archaea numbers in the adult chicken cecum microbiome [26] and cattle rumen [20] microbiomes. The majority of Archaeal EGTs corresponded to Euryarchaeota with largest proportion corresponding to the methanogenic classes. Eukaryotic EGTs were 17.02%, 4.93%, 10.25% and 17.19% of total EGTs in T1, T2, T3 and T4, respectively (Table 1). The majority of eukarya ETGs were fungi/metazoan group. These EGT proportions were expected from our current knowledge of the rumen microbiome community structure. Interestingly, few other EGTs were also recovered in present microbiome in all four groups (Table 1).

Virus EGTs were very rare and primarily composed of double stranded (ds) DNA viruses in Surti buffalo rumen microbiome. The lack of viral sequences may be because of their extensive diversity and limited representation in public databases. Additionally, viral sequences may have been overlooked because they were not enriched in fluid during sampling procedures.

The subsystems-based annotations (SEED) database (MG-RAST) was utilized to gain a better understanding of these phylogenetic trends and to predict the metabolic potential (content of EGTs) of these microbiomes (Fig. 2). The EGTs proportions were similar to cattle rumen microbiome community structure [20]. The subsystems are annotated across genomes and are based on biochemical pathways, fragments of pathways, and clusters of genes that function together, or any group of genes considered to be related. Much of this analysis is dependent on sequence databases, and while we tried to avoid database bias by using multiple databases and alternative querying algorithms for analysis, we are aware that some sequences have no matched relatives in the databases, or are over-represented in the databases. Further, sequence identity does not always mean functional similarity and this may influence the interpretation of our results as minor sequence dissimilarities may represent functionally different or even a completely new functions.

Fig. 2
figure 2

SEED subsystem composition of four rumen microbiome A (T1; 100% roughage), B (T2; 75% roughage), C (T3; 50% roughage) and D (T4; 25% roughage) are shown. The percent of environmental gene tags (EGTs) of the SEED subsystems from the rumen microbiomes (T2) is shown. The BLASTX cut off for EGTs is 1 × 10−5

Figure 2 shows the metabolic profile (subsystems) of all four rumen microbioms (T1, T2, T3 and T4). The distribution of subsystems is strikingly different for the T1 (100% roughage) microbiome from the rumen of T4 (25% roughage), which is indicative of a community metabolism which has shifted away from a roughage digesta, more easily fermentable carbohydrate-based metabolism. It appears that animals from group T4 have low fiber metabolizing bacteria. This is consistent with the metabolism represented by the gammaproteobacteria. However, most of the gammaproteobacteria sequences were most similar to sequences from Psychrobacter-like organisms (from arctic samples), they probably are not from this genus, but rather from a close relative [27]. Our analysis were also similar with other microbiomes [25, 28], the Surti buffalo rumen microbiomes are dominated by carbohydrate metabolism in high fiber diet (100% roughage), lowest in high concentrate diet (75% concentrate), and are sparsely populated with genes for respiration, reflecting the more stable anoxic environment in the gastrointestinal tract (Fig. 3).

Fig. 3
figure 3

Carbohydrate metabolism subsystem composition of four rumen microbiomes in A (T1; 100% roughage), B (T2; 75% roughage), C (T3; 50% roughage) and D (T4; 25% roughage) are shown. The percent of environmental gene tags (EGTs) in each of the carbohydrate metabolism subsystems from the rumen microbiomes is shown. The BLASTX cut off for EGTs is 1 × 10−5

When looking solely at the Surti Buffalo (Bubuls bubalis) rumen microbiome and the SEED carbohydrate subsystem of all four groups, the central carbohydrate metabolism had a higher level in T1 group (26%) than in the T4 group (14%), while population size of monosaccharides EGTs were less 22% in T1 and higher 35% in T4. However, the carbohydrate EGTs of T2 group and T3 were similar proportion. This may be because of rumen pH together with microbial population, nature of substrates, environmental factors such as temperature, and the existence of cations and soluble carbohydrates have been suggested as factors governing bacterial attachment [29]. Ruminal pH is one of the most important of these factors, because the fibrolytic bacterial numbers are very sensitive to the pH change [30]. When ruminants are fed fiber- deficient rations, ruminal pH declines, microbial ecology is altered, and the animals become more susceptible to metabolic disorders [31].

While a limitation of the random sample pyrosequencing approach is the resulting short read lengths, we were able to assemble all of these reads into 280 total contigs of >500 nucleotides (39 from T1, 17 from T2, 187 from T3 and 37 from T4). Translations of these contigs (EGTs) were used for BLASTX analysis. The majority of these translations showed similarity with genes from the bacteroidetes, firmicutes and actinobacteria, the dominant taxa from these four microbiome. Similar observations have been also reported by Brulc et al. [20] in cattle rumen and Qu et al. [25] in chicken ceacum microbiome.

BlastX analysis indicates all contigs of four microbiome shared sequence similarities (27–100%) with different known organisms from the bacteroidetes, firmicutes, actinobacteria, proteobacteria and spirochetes etc., confirming the results from the non-assembled data and many contigs had sequence similarity with hypothetical proteins found in these four microbiome. However, four contigs (one from T1 and three from T3 group) were not showing any match with database, due to biasness in assembly process or may be novel.

Metagenomic analysis allows the relative abundances of all genes to be determined and used to generate a dataset for the assessment of the functional potential of each community [18, 32, 33]. Our exercise to assemble genes from four samples and most of the genes belonged to the Bacteroidetes/Chlorobi group suggests that this is an important phylum in Surti buffalo rumen, similar to that observed in studies of the human faecal microbiome [9, 34] cattle rumen [20] and chicken ceacum microbiome [25].

We conclude that the microbiome datasets presented herein represent the first assessment of the metabolic potential of the Surti buffalo (Bubulas bubalis) rumen microbiome at the functional gene level. As such, they represent a baseline for future studies and will be of great use in better understanding the large, complex, and dynamic microbial community of the rumen fluid, and the co-evolution/selection of microbes with their host and diet. It is clear that the composition and function the microbiome can be affected by various factors such as dietary ingredients, nutrient levels, environment, probiotic, and antibiotic treatments. Moreover, the gastrointestinal tract microbiome plays an important role in the growth and health of the host through its effects on gastrointestinal tract morphology, nutrition, pathogenesis of intestinal diseases, and immune responses. These microbiome data provide a critical genetic context for understanding animal nutrition, animal health and well-being. Additionally, the combined pyrosequence approach and subsystems-based annotations available in the SEED database allowed us to gain an understanding of the metabolic potential of these microbiomes. Sequence information was recovered in a comparative context based on the ecology of the microbial communities that inhabit the Surti rumen, which in the future will allow us to link metabolic potential to the identity of rumen microbes in their natural habitat.