Keywords

1 Introduction

The rumen microbiome represents the totality of rumen microorganisms, their genetic elements, and environmental interactions. The rumen microbiome plays an essential role in ruminant physiology and nutrition and pathology as well as host immunity. Rumen microorganisms convert plant fiber to short-chain fatty acids (SCFA), which contribute up to 75 % of the total metabolizable energy in ruminants. In addition to the fibrolytic capacity, rumen microorganisms also participate in ruminal nitrogen metabolism, including dietary protein degradation. However, nitrogen losses during protein degradation and methane produced during rumen fermentation are substantial contributors to water and air pollution as well as global warming. Rumen microorganisms produce a large amount of vitamins. As a result, ruminants generally do not need dietary supplement of water-soluble vitamins and vitamin K. Rumen microorganisms are able to modulate nutrient absorption and may be among the major determinants of nutrient utilization efficiency (Li et al. 2012a; Jami et al. 2014). Moreover, ruminal biohydrogenation, the saturation process of dietary unsaturated fatty acids controlled by rumen microorganisms, can be manipulated for healthier meat products (Jenkins et al. 2008). It is well known that rumen microbes play a key role in detoxifying plant secondary compounds (Wallace 2008). The involvement of the rumen microbiome in xenobiotic metabolism has been well documented (Li et al., 2014). Previous studies have identified rumen microbes responsible for the degradation of nitroaromatic explosive compounds, such as 2,4,6-trinitrotoluene (De Lorme and Craig 2009) and hexahydro-1,3,5-trinitro-1,3,5-triazine (Eaton et al. 2011).

The complexity of the rumen microbiome has long been appreciated, as evidenced by the presence of myriad microbial interactions (Li et al. 2012a). One of the major obstacles hindering our understanding of the structure and function of the rumen microbiome is that only approximately 11 % of rumen bacteria appear to be culturable (Edwards et al. 2004). DNA fingerprinting techniques widely used in earlier studies, such as terminal-restriction fragment length polymorphism (t-RFLP) and single-strand conformation polymorphism (SSCP), have limited throughput and low resolution and are therefore unable to provide a holistic view of the structure and function of the rumen microbiome. Furthermore, the rumen microbiome functions as a tightly integrated system in which all resident species interacts closely to contribute to its emergent properties. Predominant species perform all major microbial conversions in this ecosystem. Nevertheless, numerically minor species also play an important role in maximizing rumen ecosystem outputs. Disruption of one species could cause a chain reaction and result in undesired or unpredicted consequences. These properties call for a move from studies of individual rumen microorganisms in isolation or in pure culture to community-level studies, especially in their natural habitats.

Metagenomics has emerged in the past few years as a powerful tool for studying the rumen microbiome, thanks to the advent of next-generation sequencing (NGS) technologies and rapid progress in reference databases and bioinformatic tools. Metagenomics addresses the collective genetic structure and functional composition of a microbial community without the bias or necessity for culturing its individual inhabitants (Galbraith et al. 2004). Rumen metagenomics enables comprehensive studies of the structure and function of the rumen microbiome using culture-independent approaches. Rumen metagenomics generally includes two major arenas: high-throughput screening of cloned expression libraries made from rumen metagenome DNA for gene products of interest (functional metagenomics) and sequencing-based characterization of the aggregate collection of genomes and genes present in rumen microbial communities, at both DNA (metagenomics) and RNA levels (metatranscriptomics). Functional screening technology was first applied to rumen materials to mine novel enzymes in 2005 (Ferrer et al. 2005), whereas the first publication using next-generation sequencing-based rumen computational metagenomics can be traced back to 2009 (Brulc et al. 2009). Since then, metagenomic technologies have been extensively utilized to investigate rumen microbial communities. The rumen of an individual animal is believed to harbor hundreds and up to 1,000 microbial species. Therefore, microarrays (such as PhyloChips), DNA fingerprinting techniques, or traditional Sanger sequencing-based methods that are unable to provide a sequencing depth of greater than 1,000 sequence reads will be excluded for discussion in this chapter. We will summarize recent advances in metagenomic technologies and novel metagenomic insights into the structure and function of the rumen microbiome.

2 Functional Metagenomics

Functional metagenomics is the study of the collective genome of a microbial community by expressing it in a foreign host (Ekkers et al. 2012). The vast majority of enzymes that catalyze biochemical reactions are encoded by genes present in microbial communities under various environmental conditions. For example, a recently developed database lists 510 commercially useful enzymes used in various sectors, including agriculture, energy, and biomedicine (Sharma et al. 2010). Therefore, functional screening has become an increasingly important field for discovering novel biomolecules for applications in biotechnology and medicine. This approach relies on cloning of vast genetic diversity from a target habitat in various vectors (e.g., plasmids, cosmids, fosmids, or bacterial artificial chromosomes) and then expressing cloned metagenome libraries in foreign host systems (e.g., E. coli) followed by detection and characterization of desired functional activities in the expression libraries using various strategies (Simon and Daniel 2009). Functional screening provides direct access to largely unexploited microbial genetic diversity in the environment. Lignocellulose biomass, including cellulose, hemicellulose, pectin, and lignin, is the most abundant source of organic carbon on the planet. Efficient enzymatic conversion of biomass into biofuel has been of great interest recently. The complete degradation of lignocelluloses requires a concerted action of dozens of enzymes from various families, such as endo-β-1,4-glucanases, cellobiohydrolases, β-glucosidases, endoxylanases, β-xylosidases, α-l-arabinofuranosidases, acetyl xylan esterases, feruloyl esterases, and α-glucuronidases. Previous studies suggest that the rumen microbial ecosystem harbors a dazzling array of microbial diversity (Li et al. 2012a, b, c; Sparks et al. 2012) and is a rich source of efficient fibrolytic enzymes. A relatively small fraction of rumen microorganisms have been successfully cultured to date. The largely unexplored ruminal microbial diversity represents an untapped source of unique lignocellulose-digesting enzymes, especially those with multiple functions. Numerous efforts have been made to isolate fiber-digesting enzymes from the rumen, including various hydrolases from at least 8 glycosyl hydrolase families, such as GH3, GH5, GH8, GH9, GH10, GH13, GH26, GH43, GH48, and GH57. Morgavi et al. (2013) summarized the screening results prior to 2012. Results from functional screening of the rumen microbiome since 2012 are listed in Table 16.1.

Table 16.1 Lignocellulose-digesting enzymes mined from the rumen using functional metagenomic approaches since 2012

Despite the fact that the huge potential of functional screening in mining genetic diversity for biotechnology applications has been demonstrated by the abovementioned case studies, numerous challenges remain. First, only a small fraction of functional diversity is captured in expression libraries, partially due to the difficulty in expressing target genes in a foreign host. Moreover, current methods to detect desired function or enzyme activities are less sensitive; and the throughput of screening methods is relatively low. Novel strategies, such as fractionation of the microbial community using habitat biasing methods to reduce the complexity of the microbiome or to enrich desired activities, have been developed to overcome these limitations (Ekkers et al. 2012). Furthermore, the potential power of novel technology using in vitro compartmentalization (IVC) in combination with fluorescent-activated cell sorting (FACS) in aiding functional screening of complex microbial ecosystems has been recognized (Ferrer et al. 2009). It is foreseeable that in combination with rapid advances in directed evolution techniques and methods (Dalby 2011), more enzymes and biomolecules with improved activities will be isolated using functional screening from the rumen microbiome for a wide range of applications.

3 Computational Metagenomics: Methods and Approaches

The advent of ultrahigh-throughput next-generation sequencing technologies and rapid development of computational tools and resources have stimulated computational metagenomic studies. As a result, computational metagenomics provides novel insights into the structure and function of microbial communities of host-associated habitats or from environmental samples at unprecedented resolution. The approach targeting small subunits (SSU) of rRNA genes (16S or 18S) allows us to interrogate the microbial composition and structure of the rumen microbiome. The whole-genome shotgun (WGS) approach provides unique opportunities to gain novel insights into the protein repertoire and metabolic potential of the microbiome, which lead to biological pathway reconstruction. Moreover, WGS approach enables taxonomical assignment to understand the microbial composition and structure of the rumen microbiota.

3.1 Ribosomal RNA Gene-Based Analysis

SSU ribosomal RNA genes, such as 16S rRNA for prokaryotes and 18S rRNA genes for eukaryotes, can be amplified from metagenomic DNA of various fractions of rumen materials. These two genes are most frequently used for phylogenetic analysis and microbial diversity studies in the rumen. Taxonomic informativeness of 9 well-defined hypervariable regions (V1 to V9) of the 16S rRNA gene varies tremendously (Chakravorty et al. 2007). As a result, effects of various primer combinations on classification accuracies have been designed and compared (Nossa et al. 2010; Soergel et al. 2012). The position of primers and amplicon length are major determinants of taxonomic precision. Most importantly, taxonomic informativeness of primers is habitat dependent. No primers are truly universal and work best in all environments (Soergel et al. 2012). For example, primer pairs 343 F and 798R, targeting on hypervariable regions V3 to V4, produce maximal classification accuracy under the current limitation of NGS platforms and may be the most suitable for human foregut microbiome studies (Nossa et al. 2010). Primer pairs targeting on V1 to V3, V3 to V5, and V6 to V9 generally result in overall similar and yet accurate classification with minor bias (Vilo and Dong 2012). Indeed, primers targeting V1 to V3 and V3 to V5 regions are commonly used in rumen microbiome studies (Table 16.2).

Table 16.2 A summary of metagenomic studies in ruminants using next-generation sequencing technologies

The amplicons from target regions of 16S (18S) rRNA genes are sequenced using next-generation sequencers, such as 454 FLX or Illumina sequencers. While barcoded pyrosequencing has been the mainstay in sequencing the 16S amplicons of the rumen samples (Jami et al. 2013; Li et al. 2012c; Wu et al. 2012b), Illumina-based sequencing technology is increasingly gaining attention. The newly launched Illumina MiSeq sequencer with version 3 reagent kits enables generation of up to 25 million sequences with a length up to 2 × 300 bp (pair end). The reagent cost for such a run is approximately $1400.

Raw sequence quality needs to be checked and then filtered and trimmed. Sequencing error and PCR single-base substitutions from the 454 platform can be removed using AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single-base errors (Quince et al. 2011). The Perseus program can be used to remove chimeras. Processed 16S sequence reads are then analyzed using taxonomy-dependent and taxonomy-independent approaches. The taxonomy-dependent approach generally assigns 16S sequences to various levels of taxa based on sequence similarities to annotated sequences deposited in public databases. The commonly used SSU databases include EzTaxon-e (Kim et al. 2012b), Greengenes (DeSantis et al. 2006), RDP (Cole et al. 2009), and SILVA (Quast et al. 2013). The algorithm RDP Classifier (Wang et al. 2007) is among the most frequently used programs for taxonomic classification and has resulted in the publication of more than 400 articles since its launch. However, inherent limitations of this approach are obvious: (1) inability to assign novel sequences from previously undescribed species that have no matches in existing reference databases, (2) accuracy and robustness of taxonomic classification that is dependent on the coverage and quality of the database used, and (3) low resolution. This approach is often unable to assign input query sequences to species or strain levels. These limitations become more serious for rumen microbiome studies because the SSU sequences of rumen origin are particularly underrepresented in public databases. To overcome these limitations, the taxonomy-independent clustering approach has been developed. This approach uses various clustering algorithms to assign query sequences into operational taxonomic units (OTUs) based on a distance matrix at a specified threshold (Chen et al. 2013). Its independence from existing databases allows the analysis of novel sequences. More than 15 taxonomy-independent algorithms, such as CD-HIT-OTU (Fu et al. 2012; Li and Godzik 2006), CROP (Hao et al. 2011), ESPRIT (Sun et al. 2009) and ESPRIT-tree (Cai and Sun 2011), MOTHUR (Schloss et al. 2009), UClust (Edgar 2010), and UPARSE (Edgar 2013), have been published to date. A novel algorithm, TBC, incorporating the basic concept of taxonomy into clustering has been published (Lee et al. 2012c). Recently, the relative performance and parameters of some of these algorithms have been compared (Chen et al. 2013; Wu et al. 2012a). For the rumen data, CD-HIT-OTU performs well in our hands (Wu et al. 2012a; Li et al. 2012d). This algorithm, which uses a greedy incremental clustering process to identify OTUs from 16S rRNA gene sequences, is able to assign millions of reads in a relatively short time. Most importantly, the program avoids overestimation of OTUs, a common problem associated with many existing programs, and results in accurate estimation of microbial diversity. In addition to these algorithms, publicly accessible pipelines, such as MOTHUR and QIIME (www.qiime.org), are very popular in analyzing SSU sequences and have been widely used to analyze the rumen datasets (Castro-Carrera et al. 2014; Lee et al. 2012b; Omoniyi et al. 2014; Pitta et al. 2014; Pope et al. 2012). QIIME also wraps other applications, such as FastTree, PyNAST, RDP Classifier, and UClust. The microbial community structure between different samples can then be compared and visualized using UniFrac (Lozupone and Knight 2005) and Fast UniFrac (Hamady et al. 2010).

3.2 Whole-Genome Shotgun Approach

WGS sequencing provides an opportunity to analyze both microbial diversity and functionality encoded in the genomes of rumen microbial communities. Driven by its application potential in metagenomics, numerous tools have been developed to analyze WGS data (Fig. 16.1). NGS technologies, such as those from Roche 454 pyrosequencing, Illumina, Ion Torrent, and PacBio platforms, have significantly reduced the time and cost of metagenome sequencing, which are revolutionizing metagenomic studies. Unique features and advantages of various NGS platforms, including future DNA sequencing technologies, have been extensively reviewed (Zhang et al. 2012). NGS generally relies on either synthesis or hybridization at a massively parallel scale. For example, Illumina HiSeq 2500, in conjunction with improved version 4 chemistry, generates up to 1 terabase (1 trillion base) of sequence data in a single run (~167 Gb per day). The ultralow cost, enormous throughput, and extreme convenience of these NGS technologies have directly contributed to their instant acceptance and utility in the metagenomic community. Sequences generated by NGS technologies have some unique characteristics, such as short read lengths, platform-specific biases, and relatively high error rates, which could have a significant impact on downstream analyses. The computational challenges in handling short sequence reads have been extensively discussed (Pop 2009; Pop and Salzberg 2008). The challenges generally include difficulties in dealing with repetitive sequences as well as the need to modify existing algorithms to solve platform-specific errors and high error rates (Pop 2009). Furthermore, the production of billions of reads in a single sequencing run by such as Illumina HiSeq 2500 sequencers poses a tremendous challenge on computational resources.

Fig. 16.1
figure 1

A typical workflow and computational tools for rumen metagenomics

Bioinformatic pipelines for NGS shotgun sequences generally include six steps: raw read quality control (QC) and trimming, assembly, functional annotation and metagenomic pathway reconstruction, taxonomy assignment, statistical analysis, and global network inference (Fig. 16.1). These processes have been extensively reviewed (Kim et al. 2013; Luo et al. 2013). The first step in dealing with WGS sequences involves QC, filtering, and trimming processes. Host sequence contamination can be removed using DeconSeq (Schmieder and Edwards 2011) or BLAST/Blat. Sequencing errors and PCR single-base substitutions as well as chimeras from the 454 platform can be removed using AmpliconNoise. Raw sequences generated by the Illumina platform can be trimmed using SolexaQA (Cox et al. 2010).

WGS sequences after these QC steps will generally need to be assembled into longer contigs for downstream applications. Assembly improves functional annotation. The basic framework of NGS assembly includes 4 stages: a preprocessing filtering, a graph construction process, a graph simplification process, and a post-processing filtering (El-Metwally et al. 2013). More than a dozen short-read assemblers have been developed to facilitate the analysis of short WGS sequences (Huang et al. 2012; Miller et al. 2010). The de Bruijn graph-based approach is among the most commonly used in short-read de novo assemblers, such as ABySS (Simpson et al. 2009), EULER-USR (Chaisson et al. 2009), SOAPdenovo and its memory-efficient version, SOAPdenovo2 (http://soap.genomics.org.cn/soapdenovo.html), and Velvet (Zerbino and Birney 2008). Recently, efforts have been made to understand the unique features and limitations of these short-read assemblers (Zhang et al. 2011; Huang et al. 2012; Mende et al. 2012; Vazquez-Castellanos et al. 2014). For 454 pyrosequencing data, Newbler is among the widely used assemblers and has been used in the analysis of rumen WGS sequences (Li et al. 2012b). Genovo, a de novo assembler specifically designed for 454-based metagenomic sequences using a generative probabilistic model (Laserson et al. 2011), and its extended version, Xgenovo (Afiahayati et al. 2013), perform well and are able to generate long contigs (Vazquez-Castellanos et al. 2014). Our results using simulated metagenomic datasets show that ABySS and SOAPdenovo produce higher N50 and require relatively low memory usage, while Velvet and SOAPdenovo produce higher genome coverage (Huang et al. 2012). Both Velvet and SOAPdenovo result in a lower percentage of contig chimerism; while not specifically designed for metagenomic datasets, de Bruijn graph-based assemblers have been proven appropriate for large datasets with hundreds of millions of short reads (Zhang et al. 2011) and have been nevertheless extensively used in metagenomic studies. Recently, by making use of abundance differences and graph connectivity for the decomposition of the de Bruijn graph, an extended version of Velvet, MetaVelvet, has been developed to handle metagenomic data (Namiki et al. 2012). MetaVelvet is able to generate significantly higher N50 scores than other short-read assemblers, leading to an increased number of predicted genes in our hands. Similarly, a de novo metagenomic assembler, Meta-IDBA (Peng et al. 2011), and its revised version, IDBA-UD (Peng et al. 2012), have been shown to generate longer contigs with high accuracy.

Genes or open reading frames (ORF) are then predicted from assembled contigs using a variety of gene prediction (gene-calling) algorithms. A dozen gene-calling programs have been developed for the metagenomic datasets, such as FragGeneScan (Rho et al. 2010), Glimmer-MG (Kelley et al. 2012), MetaGene (Noguchi et al. 2006) and MetaGeneAnnotator (Noguchi et al. 2008), MetaGeneTack (Tang et al. 2013), and Orphelia (Hoff et al. 2009). Gene prediction is an essential step for WGS metagenome data analysis for two reasons: (1) it is necessary for functional annotation and pathway reconstruction, and (2) gene prediction reduces the computational burden of protein similarity searches by nearly a factor of 6, compared to BLASTX (Trimble et al. 2012). In a direct comparison of 5 commonly used gene-calling algorithms, it is found that FragGeneScan performs better than MetaGeneAnnotator, MetaGeneMark, or Orphelia, especially for short reads (<1,000 bp) with sequence errors (Trimble et al. 2012). FragGeneScan combines sequencing error models and codon usages in a hidden Markov model to predict ORF in short reads (Rho et al. 2010) and has been used in the publically available MG-RAST pipeline (Wilke et al. 2013). Recently, a newly improved algorithm, MGC, has been published (El Allali and Rose 2013). This program relies on different models for different regions with various GC-contents and includes amino acid usage features to improve overall accuracy. It performs better in terms of sensitivity and specificity than both FragGeneScan and Orphelia in simulated metagenomic data (El Allali and Rose 2013).

To gain insights into functional potentials, predicted genes are further annotated against various public databases using a homology-based approach. For example, COG (Tatusov et al. 2000) and eggNOG (Powell et al. 2014) databases can be used to classify functional categories of predicted genes. Pfam (Punta et al. 2012), TIGRfam (Haft et al. 2003), and FIGfam (Meyer et al. 2009) databases can be used for protein family analysis. For rumen metagenomic data, the Carbohydrate-Active enZYmes database (CAZy), a database collecting and annotating the families of catalytic and carbohydrate-binding modules of enzymes that degrade, modify, or create glycosidic bonds, has been frequently used to mine fibrolytic enzymes in the rumen (Brulc et al. 2009; Hess et al. 2011). Pfam is a widely used database for protein family analysis and includes more than 14,800 annotated protein families in its latest release (v27.0). These families are also organized into groupings of related families (clans) based on similarity of sequences and structures. Furthermore, Gene Ontology (GO) can be extracted from these protein families using the Pfam2GO program (Hayete and Bienkowska 2005). The metagenomic features, such as COG functional profiles and metabolic subsystem data, between samples from two treatment groups can be analyzed using statistical packages, such as MetaStats (White et al. 2009). Metabolic pathways can then be reconstructed using databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa 2002) and BRENDA, the enzyme database (Schomburg et al. 2004). Metabolic pathways that differ between samples from two treatment groups can then be analyzed for statistical significance using MetaPath (Liu and Pop 2011) and Metagenomic Annotation Networks (Vey and Moreno-Hagelsieb 2012). Furthermore, network analysis tools and algorithms can be used to infer global co-occurrence patterns for the microbiome-wide microbial interactions (Faust and Raes 2012; Faust et al. 2012). The tools available for co-occurrence and association network analysis in metagenomic databases include CoNet (http://psbweb05.psb.ugent.be/conet/download.php), extended Local Similarity Analysis (eLSA) (Xia et al. 2011), QIIME, and a metagenome-wide association study (Qin et al. 2012).

Recently, using statistical distribution methods that ignore known biological processes to handle metagenomic data has been questioned (Liberles et al. 2013). These authors suggest that mechanism models based on ecological relationships, such as predator–prey dynamics and competitive relationships that widely exist in the rumen microbial community, should be incorporated in metagenomic data analysis for ecological inference.

In addition to the individual bioinformatic tools and resources described above, several publicly available Web-based pipelines have been developed for metagenomic data analysis. These online platforms, such as CAMERA (http://camera.calit2.net/), IMG/M (http://img.jgi.doe.gov/), METAREP (http://jcvi.org/metarep/), and MG-RAST (http://metagenomics.anl.gov/), have integrated various tools for gene prediction, functional or protein family assignment, and protein interaction and metabolic pathway inference in a user-friendly format. For example, the latest version of the MG-RAST server integrates data uploading, QC, and annotation and analysis of various datasets, such as 16S or amplicon sequences and WGS metagenome and metatranscriptome sequences (Wilke et al. 2013). MG-RAST relies on FragGeneScan as a gene-calling program and BLAT for homology-based similarity search. Since its launch in 2007, more than 108,500 datasets have been analyzed.

WGS metagenomic sequence data not only enable analysis of functionality and metabolic potential of the microbial community but also provide a means for taxonomical assignment (binning). Numerous tools, such as metaBEETL (Ander et al. 2013), have been developed for taxonomic classification of WGS data from microbial communities. These tools are generally divided into 3 major categories: homology or similarity-based, composition-based, and hybrid approach. The lowest common ancestor (LCA) algorithm has formed a base for many of the similarity-based classification methods, such as in WebCARMA (Gerlach et al. 2009), CloudLCA (Zhao et al. 2012), MEGAN (Huson et al. 2007), and DiScRIBinATE (Ghosh et al. 2010). It is shown that the latter significantly reduces binning time with superior assignment accuracy. Composition-based methods, which exploit the uniqueness of DNA base composition in genomes of different taxonomic entities, have been implemented in programs such as PhyloPythia (McHardy et al. 2007), Phymm (Brady and Salzberg 2009), TACOA (Diaz et al. 2009), and TaxSOM (Weber et al. 2011). The hybrid approach for binning generally combines both similarity- and composition-based methods, such as PhymmBL (Brady and Salzberg 2009) and RITA (MacDonald et al. 2012), for better accuracy. However, these two methods tend to be computationally time-consuming. To overcome this problem, a new algorithm, MetaPhlAn, has been developed (Segata et al. 2012). MetaPhlAn estimates the relative abundance of microbial cells by mapping short reads against a set of 400,141 clade-specific marker genes and allows for more accurate taxonomical assignment down to a species level in minutes of computational times for millions of WGS reads (Segata et al. 2012).

3.3 Stable Isotope Labeling

Stable or “heavy” isotopes, such as 13C and 15N, can be used to label various substrates, either small molecules (e.g., glucose) and polysaccharides (such as inulin) or even whole plants. Microorganisms that utilize these labeled substrates will most likely incorporate the heavy or stable isotope more efficiently into their biomolecules, including DNA or RNA. The labeled DNA or RNA can be readily separated from unlabeled, normal “light” DNA or RNA by isopycnic density-gradient ultracentrifugation, in combination with magnetic-bead capture techniques and isotope ratio mass spectrometry. The enriched SIP-RNA/DNA can be then studied using standard molecular technologies. For example, SIP has been widely used to investigate community function in microbial ecosystems or genes responsible for bioremediation (Uhlik et al. 2013). The potential of SIP technology in metagenomic studies is immediately recognized. The combined approaches not only permit the detection of low-abundance species in a complex microbial community but also facilitate the discovery of novel enzymes and bioactive compounds (Chen and Murrell 2010). Moreover, SIP-metagenomic technologies enable the enrichment of metabolically active fractions of microorganisms from environmental samples or the gut microbiome, which can provide a powerful link between microbial phylogeny and metabolic functionality and activity. Such a link is important in understanding the role of rumen microbiota in normal physiology and pathogenesis. SIP-RNA technology has demonstrated that changes in the functional activity of the human gut microbiota are associated with nutrient sources and medium types (Reichardt et al. 2011). In ruminants, compounds labeled by SIP have been widely used in nutrition studies. For example, 13C-labeled n-alkanes of plant origin are used as an internal tracer to assess digesta passage kinetics through the gastrointestinal tract (Warner et al. 2013). In addition, a stable isotope tracer, 13C-linolenic acid, has been used to investigate the biohydrogenation process of linolenic acid in a bovine rumen microbial community (Lee and Jenkins 2011). Using a small-scale repeated batch culture model of cattle fecal microbial communities, 13C-labeled fructose in combination with a modified t-RFLP molecular fingerprinting identifies Streptococcus bovis as the most dominant and Lactobacillus vitulinus and Megasphaera elsdenii as minor fructose fermenter, while several species of Clostridium cluster IV are non-fermenters of fructose (Michinaka and Fujii 2012). It is conceivable that the importance of SIP-RNA/DNA technology in rumen metagenomic studies will soon be recognized.

3.4 Gnotobiotic Rumen Models

Gnotobiotic, including germ-free, animals, which have well-defined microbial composition, provide an elegant model system to study myriad interactions between individual microbes and between microbes and the host. The microbial communities of varying complexity and origin can be then sequentially introduced to gnotobiotic animals to examine the effects of genetic background, dietary conditions, and physiological stages on the microbial community structure and dynamics. Synthetic gut microbiota with known microbial composition and abundance can be created in germ-free animals. When the complete genome and transcriptome of these introduced microbial species become known, these systems can then be used to measure perturbation dynamics of the entire microbial community and to refine tools and algorithms using computational metagenomics.

Gnotobiotic lambs have been used to study rumen microbial establishment sequences and interactions of microbes of different functional groups for more than a decade (Fonty et al. 1989). The rumen of these animals harbors a defined microbial community. Rumen microbial species with known function can be sequentially added to the defined community. Therefore, gnotobiotic lambs are an ideal model to study the role of specific microorganisms and their interactions with other species in the rumen, such as the relationship between H2-producing and H2-consuming communities. Early results show that lambs lacking methanogens can be raised to adulthood, although their feed intake is lower compared to conventional lambs with functional methanogens (Fonty et al. 2007). A concomitant reduction in SCFA production as well as overall microbial complexity in these lambs is also observed. Moreover, acetogens can colonize and become rapidly established in the rumen of methanogen-free lambs, suggesting their establishment is independent of other microbes, unlike cellulolytic bacteria that generally require the presence of a diverse microbiota for establishment. On the other hand, methanogen colonization in the rumen does not substantially affect acetogen diversity (Gagen et al. 2012). Recently, interactions between fibrolytic species and methanogens have been examined using the gnotobiotic model (Chaucheyras-Durand et al. 2010). Methane emission is reduced when the dominant fibrolytic species in the rumen is a non-H2 producer, such as Fibrobacter succinogenes, compared to the rumen with H2-producing fibrolytic species, such as ruminococci and anaerobic fungi. These results suggest that dietary intervention strategies to promote non-H2-producing fibrolytic species may represent a novel approach to mitigate methane production in farm animals. Utilization of metagenomic tools in the gnotobiotic rumen model will facilitate our understanding of microbial establishment sequence and succession of the rumen microbiome.

3.5 Metatranscriptomics, Metaproteomics, and Metabolomics

Sequencing-based metagenomics addresses the collective genetic structure and functional composition at the DNA level of a microbial community in a culture-independent manner. While vitally important, metagenome characterization using the DNA-based approach does not itself reveal how the genetic information of a given microbiota is actually expressed. To characterize how genes in the metagenome are expressed and regulated, metatranscriptomics, a comprehensive measure of mRNA transcript abundance, dynamics, and regulation under a variety of environmental conditions or developmental and physiological/pathological stages, is developed (Lim et al. 2013; Qi et al. 2011). While metatranscriptomic analysis provides insights into how the metagenome is expressed and regulated, metaproteomics allows comprehensive characterization of the gene products (proteins) encoded in the metagenome and their posttranslational modifications and turnover. Metaproteomics has recently been applied to analyze the human salivary supernatant (Jagtap et al. 2012). However, due to limitations in accurate detection and mass measurement of peptides and their annotation, metaproteomics currently allow characterization of only a relatively small fraction of the gene products in a complex gut microbiota (Wilmes and Bond 2006). Similarly, a comprehensive survey of metabolites in the host, diet (forage), and its rumen microbiome, metabolomics, has been conducted to provide information on key players responsible for the microbiome function (Lee et al. 2012b; Kingston-Smith et al. 2013). Metabolomic profiling using nuclear magnetic resonance spectroscopy, in combination with 454 pyrosequencing, demonstrates uniqueness of the microbial composition and metabolites in the rumen of Korean native goats (Lee et al. 2012b). Metabolic data facilitate the study of interactions between bacteria-specific metabolites and host proteins (Jacobsen et al. 2013). Together, rapid integrations of these OMIC technologies, including metagenomics for metagenomic DNA, metatranscriptomics for RNA, metaproteomics for proteins and peptides, and meta-metabolomics for metabolites, will provide a holistic insight into the structure and function of the rumen microbiota.

4 Metagenomic Insights into the Structure and Function of the Rumen Microbiome

4.1 Microbial Establishment and Succession During Rumen Development

Microbial products, such as SCFAs, play a critical role in rumen development. As a result, numerous attempts have been made to understand microbial establishment sequences and ecological succession of the developing rumen microbiome. It is generally accepted that the rumen is sterile at birth. Earlier studies demonstrate that bacteria start colonization in the rumen within the first 24 h of life and strictly anaerobic bacteria become predominant by the second day after birth (Fonty et al. 1989). Major functional groups of microorganisms, including fibrolytic bacteria and methanogens, become established in the rumen within the first week of life, followed by protozoa (Morvan et al. 1994; Quigley et al. 1985). Our knowledge of the microbial establishment and succession in the developing rumen and during the transition to mature rumen has been significantly expanded (Li et al. 2012b; Jami et al. 2013; Malmuthuge et al. 2014), largely due to the advent of metagenomic technologies. For example, it is generally accepted that methanogens start colonizing the rumen 3–4 days after birth (Fonty et al. 1989). A recent study shows that methanogens can be detected in the ovine rumen 17 h after birth (Gagen et al. 2012). A systematic cataloging of microbial diversity and functionality in the developing rumen using both 16S rRNA gene-based and WGS approaches has been attempted (Li et al. 2012a, b, c). Sequences from more than 24 prokaryotic phyla and 22 eukaryotic phyla are identified in rumen microbial communities of preruminant (14-day-old and 42-day-old) calves fed the same milk diet. Furthermore, the rumen microbiome of preruminant calves harbors considerable functional diversity, as evidenced by the existence of 8,298 Pfam protein families. A total of 156 and 120 genera are identified in the rumen microbiota of 14-day-old and 42-day-old calves, respectively. Fibrolytic bacteria and glycoside hydrolase protein families are abundant in the developing rumen, before the calves are fed a solid diet. Moreover, the fibrolytic capability of the developing rumen increases as calves’ age. Interestingly, rumen development has a significant impact on microbial diversity. Genus-level richness indices ACE and Chao1 are significantly higher in the rumen of 14-day-old calves than that of 42-day-old calves. The rumen microbiome of younger calves displays a more heterogeneous microbial composition and harbors a greater number of bacterial genera (many of them may be transient) than the older calves fed the same milk replacer diet (Li et al. 2012b). This observation is in agreement with a general ecological theory that biodiversity tends to increase during early succession as new species arrive but may decline in later succession as competition eliminates opportunistic species. Rumen microbial composition changes from birth to adulthood have been monitored using pyrosequencing (Jami et al. 2013). This study shows that the fibrolytic species, such as Ruminococcus albus, is detectable in the developing rumen as early as one day of age while another major fibrolytic species, Fibrobacter succinogenes, appears much later. The observation that the presence of fibrolytic capacity in the developing rumen prior to exposure of solid diet is in agreement with previous studies (Li et al. 2012b) and in human infants (Koenig et al. 2011). Developmental stages appear to be one of the major determinants of the rumen microbial establishment as evidenced by the significant differences observed in rumen microbial composition between 14-day-old and 42-day-old calves (Li et al. 2012b) and between 6-month-old and 2-year-old cattle (Jami et al. 2013) that are fed the same diet. The direct comparison of microbial establishment sequences in the bovine rumen (Jami et al. 2013) and the hindgut of the human infants (Koenig et al. 2011) identifies similar wavelike patterns, coincident with critical life events such as diet, development, and health status, suggesting that similar forces may drive the establishment of microbial communities in two different habitats (Jami et al. 2013).

4.2 Rumen Microbial Diversity and the Core Rumen Microbiome

The collective microbial diversity in the rumen has been illustrated by a meta-analysis of all curated 16S rRNA gene sequences of rumen origin (13,478 bacterial and 3,516 archaeal sequences) deposited into the RDP database (Kim et al. 2011). This analysis has identified a total of 19 phyla and 5,271 and 942 OTUs for rumen bacteria and archaea, respectively. Approximately 1,000 OTUs are likely present in the rumen of fistulated cows and 587 OTUs are detected in all 4 samples from these two cows (Hess et al. 2011). In our studies, a total of 21 phyla are collectively identified from the rumen microbiome of dairy cows (Li et al. 2012c), with the mean of 16 phyla for the mature rumen of dairy cattle (Wu et al. 2012a). The mean number of genera in the rumen of individual cattle is 79.9 ± 14.0 (± sd). In the mature rumen of dairy and beef cattle, the mean numbers of OTUs identified are 512 and 343, respectively. Together, these results suggest that the number of microbial species in a typical rumen of individual animals will likely be in the range of several hundreds.

The rumen microbiome is highly responsive to diet (Ellison et al. 2014), developmental stage (Li et al. 2012b; Jami et al. 2013), genetics, and numerous environmental factors. Substantial variations exist in microbial compositions and functionality among individual rumen samples within a species and between ruminant species (Jami and Mizrahi 2012). It is probable that a set of core taxa or OTUs (species) are shared by the rumen microbiome of individual animals in all ruminant species within the large context of variation. The core rumen microbiome, consisting of a common set of microbial taxa that are shared by all individual samples, may contribute to basic rumen function. Defining the core rumen microbiome is of significance in understanding basic structure and function of the rumen microbiome and has been recently attempted (Li et al. 2012b, c; Wu et al. 2012a; Petri et al. 2013b). The core rumen microbiome of the bovine rumen microbiome, both the developing and mature rumen of dairy and beef cattle, consists of 8 phyla, 11 classes, and 15 families (Wu et al. 2012a). These 8 phyla, accounting for 99.5 % of input 16S sequences, are Bacteroidetes, Firmicutes, Proteobacteria, Spirochaetes, Fibrobacteres, Verrucomicrobia, Synergistetes, and Actinobacteria, in descending order of relative abundance. The core bovine rumen microbiome likely represents minimal components of the rumen microbial community. As Table 16.2 shows, only a small number of approximately 150 ruminant species have been systematically studied for rumen microbial diversity using metagenomic tools. The rumen microbiome composition between ruminant species has been investigated (Kittelmann et al. 2013). Recently, we have compared microbial community compositions of the bovine (N = 8), caprine (N = 10), and ovine (N = 10) rumen in order to define the core rumen microbiome using deep 16S sequencing. The mean number of 16S rRNA gene sequences per sample is 79,213.0 ± 11,682.2 (mean ± SD; N = 28; the mean read length = 300 bp), a sequencing depth estimated to reach 99.9 % coverage (Kim et al. 2011). Our preliminary results show that collectively, 22 phyla and 94 families are detected in the rumen microbiome of cattle (cows), goats, and sheep. The mean number of the phyla per animal in the rumen microbiome of cattle, goats, and sheep is 16.8, 11.8, and 16.9, respectively. The caprine rumen has a significantly smaller number of phyla than the bovine and ovine rumen (P <10−5) in this study. The family-level composition follows the similar trend. The core rumen microbiome consists of 8 phyla, Actinobacteria, Bacteroidetes, Euryarchaeota, Firmicutes, Proteobacteria, Spirochaetes, Synergistetes, and Verrucomicrobia, which is consistent with our previous study of the core bovine microbiome (Wu et al. 2012a).

The 15 families consisting of the core rumen microbiome (Table 16.3), representing >95 % of assigned 16S sequences in each sample, will likely contribute to the basic function of the rumen microbial ecosystem. However, the relative abundance of 15 families consisting of the core rumen microbiome varies significantly among the 3 host species, despite their high prevalence. For example, the abundance of the family Prevotellaceae, the most abundant family in the rumen of all 3 species, in the caprine rumen (24.26 %) is significantly lower than in the bovine (51.12 %) and ovine (35.16 %) rumen, while the abundance of 4 families, Acidaminococcaceae, Desulfobulbaceae, Campylobacteraceae, and Succinivibrionaceae, is significantly different between species. The abundance of Ruminococcaceae is significantly higher in the ovine rumen (19.74 %) than in the bovine (8.18 %) and the caprine (4.43 %) rumen. On the other hand, the abundance of Lachnospiraceae is relatively stable in the rumen of the 3 species (between 22.00 and 24.35 %).

Table 16.3 The relative abundance of the 15 families consisting of the core rumen microbiome

4.3 Rumen Virome and Plasmidome

Previous studies have focused on bacterial and archaeal diversity in the rumen. Lack of conserved proteins and genes, such as 16S rRNA genes, among viruses or phages has hindered their discovery. The advent of computational metagenomics has made possible systematically cataloging viruses in the rumen and surveying the phage diversity. A recent study shows that more than 20,000 viral genotypes exist in the rumen of 2 of the 3 cattle analyzed (Berg Miller et al. 2012). While over 70 % of viral sequences have no significant matches to sequences in public databases, sequences associated with prophages outnumber those lytic phages approximately 2:1. Moreover, rumen viral sequences carry functional genes; and the majority of these genes belong to phages, prophages, transportable elements, and plasmid subsystem according to the SEED database, as expected. In another study, 14 putative viral sequences over 30 kb are identified (Ross et al. 2013). Cows housed together and fed the same diet display similar taxonomical virome profiles than those housed separately. Intriguingly, these two cohorts have similar functional characterizations, suggesting the rumen virome appears to be functionally conserved (Ross et al., 2013). Together, these results provide further evidence that viruses play an important role in horizontal gene transfer between various microorganisms, spreading antibiotic resistance genes, controlling bacterial population dynamics, and affecting animal nutrition and protein metabolism in the rumen.

Due to a relatively low copy number per bacterial cell and difficulty in distinguishing them from chromosomal DNA, rumen plasmids have not been systematically studied until recently (Brown Kav et al. 2012; Mizrahi 2012). Brown Kav et al. (2013) recently reported a method to enrich plasmid DNA from the rumen. The method takes advantage of an exonuclease that is able to digest chromosomal DNA that is sheared and linearized during extraction procedures. The resultant circular plasmid DNA is amplified using a phi29 DNA polymerase and further sequenced using an Illumina sequencer. This study provides a first glimpse of the diversity and function of the rumen plasmidome (i.e., the collective plasmid population of rumen origin). Notably, while the rumen microbial hosts can be assigned to the three major phyla, Firmicutes, Bacteroidetes, and Proteobacteria, using rumen plasmid sequences, a significant difference in the relative abundance is evident compared to the phylogenetic assignment based on the 16S sequences from the very same rumen source. For example, Proteobacteria account for approximately 20 % of rumen plasmid sequences, whereas its abundance appears to be significantly lower (~5 %) according to the 16S approach (Brown Kav et al. 2012). Functional analysis using the SEED database suggests that, in addition to the intrinsic plasmid-coding functions, the rumen plasmidome shares similarities with the rumen metagenomes and displays a significantly higher representation of the functional categories, such as “amino acids,” “cell wall and capsule,” “cofactors, vitamins, etc.,” and “protein metabolism.” These results demonstrate that rumen plasmids may play an important role in lateral gene transfer between rumen microorganisms.

4.4 Resistance and Resilience of the Rumen Microbiome

One of the primary functions of the rumen is to degrade lignocellulosic fiber to produce short-chain fatty acids for energy. The relative stability of structure and composition of the rumen microbiome becomes a prerequisite for such functions. The stability can be defined as (1) the ability to return to an equilibrium state following perturbation and (2) the ability to resist changes (resistance) or the rate of return to an equilibrium following perturbation or overall system variability (Robinson et al. 2010). Therefore, the stability of the rumen microbiome imparts resilience to perturbation, ensuring continued rumen function.

The rumen microbiome is susceptible to both natural and anthropogenic stresses and is highly responsive to changes in environmental conditions and host factors, such as critical physiological or pathological events, resulting in the creation of novel niche for other microbial species. The microbial diversity of the rumen is a reflection of the coevolution between microbial communities and their host and represents equilibrium between functional redundancy of a stable community and niche specialization. Although a few dominant microorganisms are likely to be responsible for the majority of the metabolic activity and energy influx in the rumen, it is well known that uncommon species often serve as a reservoir of genetic and functional diversity, playing key roles in microbial ecosystems. These species can become numerically important if environmental conditions change. Recently, scientific communities have begun to study the extent of temporal and spatial shifts in functionality and phylogenic composition of the rumen microbiome in response to various stresses, such as diet, critical life events (such as weaning and acidosis), and antibiotic usage, as well as their ecological and physiological implications (Li et al. 2012b, c; Jami et al. 2013; Petri et al. 2013a, b).

We have characterized temporal changes of the rumen microbiome of dairy cows in response to an exogenous butyrate disturbance (Li et al. 2012c). We reanalyzed the raw data using improved bioinformatic tools for this chapter. Our results indicate that in the rumen microbiome of dairy cows in their mid-lactation, the five most abundant phyla, Bacteroidetes, Firmicutes, Proteobacteria, Fibrobacteres, and Spirochaetes in this order, account for >99 % of observed 16S sequences. A 168-h exogenous butyrate perturbation results in significant changes in abundance of 4 of the 5 most abundant phyla. The relative abundance of Bacteroidetes and Fibrobacteres is significantly decreased, from 68.20 % to 56.74 % (at the basal level to after perturbation) and 1.43 % to 0.82 %, respectively. On the other hand, the abundance of Firmicutes and Spirochaetes is significantly increased, from 23.34 % to 30.32 % and 0.98 % to 2.17 %, respectively. The phylum Firmicutes includes a majority of butyrate-producing bacteria. The observation of exogenous butyrate increasing the relative abundance of Firmicutes suggests that butyrate itself may be butyrogenic. Indeed, a readily available energy source (exogenous butyrate) reduces the need of fibrolytic capacity of the rumen, resulting in a decreased abundance of Fibrobacteres. The abundance of these 5 phyla returns to a pre-disturbance level 168 h after perturbation, suggesting the resilient nature of the rumen microbiome. The rumen microbiome is also recovered 1 week after diet-induced acidosis challenge (Petri et al. 2013a). The analysis at family level demonstrates the same trend. The perturbation with exogenous butyrate significantly impacts 7 of the 20 most abundant families (accounting for 99.8 % of the sequences), including the 3 most abundant families, Prevotellaceae, Lachnospiraceae, and Ruminococcaceae. After the perturbation withdrawal, the abundance of these families returns to their pre-disturbed levels. Among the 20 top families, the long-lasting impact of the disturbance is observed for only 2 minor families: Anaeroplasmataceae remains elevated, while Acetobacteraceae is further repressed, 168 h after perturbation withdrawal. Our data demonstrate that the rumen microbial ecosystem displays substantial resilience to short-term disturbances. Furthermore, considerable hysteresis of the rumen microbiome is observed. The ecological role and consequences of the two families, Anaeroplasmataceae and Acetobacteraceae, in the new established rumen microbial community are worthy of further scientific attention.

5 Conclusions

Metagenomics has significantly expanded our knowledge of the rumen microbial diversity and the structure and function of the rumen microbiome, thanks to rapid advances in next-generation sequencing technologies and computational tools and resources. The rumen microbiome, consisting of hundreds of microbial species and myriad microbial interactions, plays a critical role in nutrition and normal physiology of ruminants. It is highly responsive to changes in diet, development, environmental factors, and host genetics. Alterations in the rumen microbiome have important implications for animal well-being and production efficiency. While metagenomics has proven to be a powerful tool in rumen microbiome studies, numerous challenges remain. Temporal and spatial fluctuations as well as intra- and interindividual variations of ruminal microbial composition have yet to be assessed. Current development of bioinformatic tools and resources that cope with fragmental and voluminous next-generation sequencing data is still in its infancy. Notably, the lack of fully assembled and annotated reference genomes of rumen origin in public databases has hindered functional annotation of metagenomic data. The lack of commonly accepted data analysis pipelines and standardized report formats makes direct comparisons of various metagenomic studies difficult, if not impossible. Most importantly, general theories and principles of microbial ecology have yet to be fully applied to rumen metagenomic studies. Mechanistic models are still needed to aid the interpretation of biological relevance of metagenomic data. The recent launch of the Hungate1000 community sequencing project, which aims to sequence up to 1,000 microbial genomes of rumen origin (http://www.hungate1000.org.nz/), representing a broad spectrum of rumen microbial taxa, will undoubtedly facilitate the assembly and annotation of WGS metagenomic data. It is conceivable that comprehensive studies of the rumen microbiome using metagenomic tools will broaden our understanding of the structure and function of the rumen microbiome and its role in normal physiology and pathology of ruminants, which in turn should guide our efforts to develop optimal rumen manipulation strategies for more efficient fiber digestion as well as mitigation of environmental footprints of animal farming.