Keywords

5.1 Introduction

The term “metagenomics” was first used to describe composite genomes of cultured soil microorganisms (Handelsman et al. 1998). In the context of the environmental studies, metagenomics, also known as “community genomics” or “ecogenomics” or “environmental genomics,” is the study of composite genetic material in an environmental sample. There are a large number of microbes that are considered as uncultured in different environments, be it air, soil, water, mines, and animals, and are considered inaccessible for study with traditional approaches. Humans are constantly exposed to a large and diverse pool of microorganism, which can reside in, on, and around our bodies. These microbiotas and their genomes, collectively called as the microbiome, are being characterized by “metagenomics” approaches that integrate next-generation sequencing (NGS) technologies and bioinformatics analysis. The primary focus is on the assembly of 16S ribosomal RNA hypervariable region called as targeted sequencing or whole-genome shotgun DNA sequencing reads. Such studies have been possible because of advances made in the field of genomics and its constant growth in terms of sequencing technology. Apart from this, assembly algorithms and annotation pipelines have provided key opportunities to be exploited by the scientific community. Advances in single-cell genomics, transcriptomics, and metagenomics have revolutionized studies related to cancer genomics, gene expression, metabolic pathway studies, cellular analysis, environmental analysis, and many more areas. There has been tremendous growth in terms of sequencing, assembly, and annotation at the genomics level. However, for metagenomics there is a critical need to develop new technologies and in-depth analytical approaches. Here, we present a generalized methodology that can be used for sampling and analysis of metagenomics samples acquired from any environmental location.

5.2 Metagenomics: General Methodology

Metagenomics projects utilize various methodologies which depend on the aim, and a standard metagenomics analysis protocol is depicted in Fig. 5.1. The basic steps in metagenomics analysis including sampling, sequencing, metagenome assembly, binning, annotation of metagenomes, experimental procedures, statistical analysis, and data storage and sharing are discussed.

Fig. 5.1
figure 1

Flow diagram of a typical metagenomic experiment

5.2.1 Sampling and DNA Extraction

The first and most crucial step is sample acquisition, which critically depends on the sample source. Collection of environmental samples from specific sites across various time points is analyzed in relative metagenomics studies which provide significant insight into both temporal and spatial characteristics of microflora. Another important step in a metagenomics data analysis is the processing of the samples efficiently and to ensure that DNA extracted from the sample represents all the cells present in the sample. In addition, special considerations should be given for sampling and DNA extraction which depends specifically on the sample source. For example, in a soil sample, physical separation and isolation of cells are important for maximizing DNA yield or avoid the co-extraction of enzymatic inhibitors which may interfere further in subsequent sample processing (Delmont et al. 2011). Samples from biopsies or groundwater often yield very small amounts of DNA (Singleton et al. 2011); therefore multiple displacement amplification can be performed (Lasken 2009) to amplify femtograms of DNA to micrograms.

Handling of metagenomics data with precision is a challenge for the scientific community due to large data volume leading to storage issues. Metagenomics data can be exploited for various purposes; therefore, strict and comprehensive guidelines are needed to make data publicly available with a proper format known as metadata. The metadata is known as “data about the data” that contains the when, where, and under what conditions the samples were collected. Metadata is as important as sequence data (Wooley et al. 2010), and minimum information about metagenome sequence (MIMS) contains standard formats that minimally describe the environmental and experimental data. The Genomic Standards Consortium (http://gensc.org/), an international group, has standardized the description, the exchange of genomes and metagenomes, and the rules for the associated metadata.

5.2.2 DNA Sequencing

Sequencing technologies revolutionized the genomics and metagenomics field with high-throughput sequencing. Big, dream projects consisting of sequencing genomes have become a relatively routine task owing to advances in NGS, multiplexing, reduced sequencing cost, and improved algorithms. Metagenomics samples are sequenced in the same manner; however, these samples contain both culturable and non-culturable organisms and also many such genera that have not been exploited yet by the field of genomics. The assignment of taxa to a larger percentage of the metagenome data is still a challenge. Currently, the majority of metagenomics analysis deals with sequencing of the 16S rRNA of the microbial community or a particular gene to trace the community composition which is not typical metagenomics and is referred as metagenetics or metabarcoding. In contrast, whole-genome sequencing is performed on metagenomics samples instead of sequencing a single gene. Of the various NGS sequencing technologies, 454/Roche and Illumina/Solexa have been used extensively for sequencing metagenomics samples. 454/Roche generates longer reads facilitating the assignment of a read to a particular operational taxonomic unit (OTU) which is more reliable as compared to very short reads generated through Illumina high-throughput sequencing.

5.2.3 Assembly and Annotation

The majority of current assemblers have been designed to assemble single, clonal genomes, and their utility for assembling and resolution of large number of complex organisms has to be evaluated critically. Standard assembly methods and algorithms such as de novo assembly and reference mapping are employed in metagenomics data analysis; however, to tackle the significant variation in strain and at the species level, metagenomics assemblers have been designed with the “clonal assumption” that does not allow contig formation for some heterogeneous taxa. Out of various assemblers, de Bruijn graph-based assemblers like MetaVelvet (Namiki et al. 2012) and Meta-IDBA (Peng et al. 2011) deal explicitly with non-clonality in sequencing data and try to identify a subgraph that connects related genomes. The meta-assemblers are still in development, and their accuracy assessment is still a major goal of developers as no complete reference exists to which the interpretations can be compared. Assembly is more efficient for genome reconstruction when reference genomes of closely related species are available and in low complex samples (Luo et al. 2013; Teeling and Glockner 2012). However, low read coverage, high frequency of polymorphism, and repetitive regions can hamper the process (De Filippo et al. 2012).

The assembled contigs with minimal length of 30,000 bp or longer can be annotated through existing genome annotation pipelines, such as rapid annotation using subsystem technology (RAST; Aziz et al. 2008) or integrated microbial genomes (IMG; Markowitz et al. 2007, 2009). For the annotation of the entire communities, the standard genome annotation tools are less significant, and a two-step annotation is preferentially followed. First, genes or features of interest are identified, and second, functional assignments are performed by assigning gene functions and taxonomic neighbors (Thomas et al. 2012). FragGeneScan (Rho et al. 2010), MetaGeneMark (McHardy et al. 2007), MetaGeneAnnotator (Noguchi et al. 2008), and Orphelia (Hoff et al. 2009) are the metagenome annotation tools used for defining gene features, e.g., codon usage to find the coding regions. Also, nonprotein-coding genes such as tRNAs (Gardner et al. 2009; Lowe and Eddy 1997), signal peptides (Bendtsen et al. 2004), or clustered regularly interspaced short palindromic repeats (CRISPRs; Bland et al. 2007; Grissa et al. 2007) can be identified but might require long contiguous sequences and vast computational resources.

The functional annotations are provided as gene features via gene or protein mapping to existing nonredundant (NR) protein sequence database. The sequence that cannot be mapped to the known sequence space is termed as ORFans which represents the novel gene contents. ORFans could be erroneous coding sequence (CDS) calls or may be biochemically uncharacterized bona fide genes or have no sequence but structural homology to the existing protein families or folds. The reference databases including Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa et al. 2004), eggnog (Muller et al. 2010), cluster of orthologous groups/eukaryotic orthologous groups (COG/KOG; Tatusov et al. 2003), PFAM (Finn et al. 2014), and TIGRFAM (Selengut et al. 2007) are used to provide functional context to metagenomics CDS. Three prominent systems Metagenome-RAST (MG-RAST; Glass et al. 2010), integrated microbial genomes and microbiomes (IMG/M; Markowitz et al. 2007), and CAMERA (Sun et al. 2011) perform quality control, feature prediction, and functional annotation through standardized protocols and also serve as large repositories of metagenomics datasets. These web servers have a graphical user-friendly interface that assists users to perform taxonomical and functional analysis of metagenomes, which, unfortunately, might be saturated and not customizable at times. Earlier it was reported that the standard metagenome annotation tools can only annotate 20–50% of the metagenomics sequences (Gilbert et al. 2010) and requires further refinement in the annotation algorithms, where sequence and structural homology can be taken into account altogether which is the major computational challenge.

Pathway reconstruction, one of the annotation goals, could be achieved reliably if there is robust functional annotation. To reconstruct a pathway, every gene should be in an apt metabolic context, missing enzymes should be filled in the pathways, and optimal metabolic states should be found. MinPath (Ye and Doak 2009) and MetaPath (Liu and Pop 2011) use KEGG (Kanehisa et al. 2004) and MetaCyc (Caspi et al. 2014) repositories for building networks. Most of the current platforms are not able to reconstruct variant metabolic pathways (de Crécy-Lagard 2014), since pathways and enzymes are not conserved among different environment and the inhabiting species. A web service implementation by KEGG, GhostKOALA (Kanehisa Laboratories www.kegg.jp/ghostkoala/), relates taxonomic origin of the metagenomes with their respective functional annotation, and the metabolic pathways from different taxa can be visualized in a composite map. Metabolic pathways can be constructed using gene-function interactions, synteny, and copy number of annotated genes and integrating them with the metabolic potential of metagenome consortium.

5.2.4 Taxonomic Classification and Binning

Binning, as name suggests, is to group the sequencing reads representing an individual genome or genomes of closely related organisms. The algorithms employed in grouping related sequences act either as supervised classifiers or unsupervised classifiers. Binning can be performed based on either sequence similarity/alignment or compositional features or both. Another strategy employed by tools is compositional binning that bins the genomes based on the property of conserved nucleotide composition that carry weak but detectable phylogenetic signals, e.g., GC content or particular K-mer (tetramer or hexamer) abundance distribution (Pride et al. 2003), or based on similarity-based binning where the unknown DNA fragments are binned according to the known genes in the reference database. Compositional-based binning algorithms have been exploited in PhyloPythia (McHardy et al. 2007) and PCAHIER (Zheng and Wu 2010), whereas a similarity-based binning algorithm was employed in IMG/M (Markowitz et al. 2007), MG-RAST (Glass et al. 2010), MEtaGenome ANalyzer (MEGAN; Huson et al. 2016), CARMA (Krause et al. 2008), MetaPhyler (Liu et al. 2010), and many more. Some programs such as PhymmBL (Brady and Salzberg 2009) and MetaCluster (Leung et al. 2011) employ both compositional- and similarity-based algorithms. All these tools employ either an unsupervised or supervised approach to define the bins. The compositional-based binning is not reliable for short reads of approximately 100 bp length, but if reference data is available, then with supervised similarity-based method, the taxonomic assignment of the read can be made (McHardy et al. 2007). The bins obtained will be assigned taxonomy at the phylum level which is very high and results in chimeric bins composed of two or more genomes that belong to the same phylum. The similarity-based binning algorithm if improved to assignments at lower taxonomic levels may help in creating accurate bins for a specific organism at least to a species level. Such binned reads can be assembled to obtain partial genomes of yet-uncultured or unknown organisms. The binning of reads before assembling reduces the complexity of assembly efforts and computational requirements.

The metabolic potential of the metagenome can be deciphered after the microbial diversity is known. Whole-metagenome approach where whole DNA of the community is sequenced can be used to obtain the complete information of a microbial community. The choice of sequencing platform will influence the computational resources and selection of available software to process the sequencing results. These choices in turn will be reflected in taxonomic species/genus/family level classification. Novel microorganisms identified from the analysis can potentially establish new genes with novel functions.

Taxonomic annotation can be made better by using more than one phylogenetic marker. Metagenome shotgun sequencing allows for the identification of single copy marker genes among various databases. Parallel-META (Su et al. 2014) can be used to extract ribosomal marker genes from metagenomics sequences to conduct taxonomic annotations. Single copy marker genes can be extracted using MOCAT (Kultima et al. 2012) that uses the RefMG database (Ciccarelli et al. 2006), a collection of 40 single copy universal marker genes, and “a pipeline for AutoMated PHylogenOmic infeRence” (AMPHORA; Wu and Eisen 2008), a database with 31 single copy marker genes. This pipeline, distinct from identification of marker genes, performs multiple sequence alignment, distance calculations, and clustering. The reference genomes were used to perform taxonomic annotation at a species-level resolution.

5.2.5 Statistical Analysis

The metagenomics data consists of large number of species, corresponding genes, and their functions as compared to the number of samples analyzed. Thus, multiple hypotheses are to be formed, tested, and implemented for comprehensive presentation of data. Various multivariate statistical visualization programs such as Metastats (White et al. 2009) and R packages, viz., ShotgunFunctionalizeR (Kristiansson et al. 2009), have been built to statistically analyze the metagenome data.

5.2.6 Data Storage and Sharing

Genome research has always been connected to sharing raw data, the final assemblies and annotations; however, to store metagenomics data, database management and storage system are required. All the data is stored at the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and other metagenomics repositories. The digital form of data storage is generally preferred, and despite the decreasing cost of generating NGS data, storage costs may not decline (Weymann et al. 2017); therefore, acquiring data storage in a cost-effective manner is also important.

The microbial systems can be very dynamic at different time points, e.g., as in the human gut; therefore, temporal sampling has substantial impact on data analysis, interpretations, and results (Thomas et al. 2012). Due to the magnitude of variation in small-scale experiments (Prosser 2010), a sufficient number of replicates are needed. Samples should be collected from the same habitat and should be processed in a similar fashion. The experimental plan and interpretations, if done carefully, facilitate dataset integration into new or existing theories (Burke et al. 2011). The critical aim of metagenomics projects is to relate functional and phylogenetic information to the biological, chemical, and physical characteristics of that environment and ultimately achieve retrospective correlation analysis.

5.3 Species Diversity

The diversity of species in an environmental sample is a critical question where the vast majority of marker genes have been used to classify metagenomics reads. Species-specific gene markers such as 16S/18S ribosomal DNA (rDNA) sequences have been used to estimate the species diversity and coverage in most of the analyses. rDNA as a marker gene has limitations including horizontal transfers within microbes (Schouls et al. 2003) and the presence of multiple copies of the marker gene (DeSantis et al. 2006). Other housekeeping genes such as rpoB (Walsh et al. 2004) are strong candidates, and also amoA, pmoA, nirS, nirK, nosZ, and pufM (Case et al. 2007) have been exploited in different contexts as molecular markers.

Quantifying species diversity is not trivial due to the incorporation of species richness, evenness of species, or differential abundance (Simpson 1949). In comparison of two communities, if both the communities have the same number of species but their abundance varies, then the community with the shortest difference with “assumed even abundance” will be considered as more diverse.

The diversity indices of the species are measured as α-diversity, β-diversity, and γ-diversity in ecology and microbial ecology. The α-diversity is defined as the biodiversity in a defined habitat (i.e., a smaller ecosystem), whereas β-diversity compares species diversity between habitats (or between two ecosystems).The γ-diversity is considered as the total biodiversity over a large region containing several ecosystems (Wooley et al. 2010). Rarefraction curves are used to estimate the coverage obtained from sampling which tells whether the species in a particular habitat has been exhaustively sampled or not. All these indices are calculated in metagenomics data analysis by employing various software and tools including EstimateS (Colwell et al. 2004), Quantitative Insights Into Microbial Ecology (QIIME; Caporaso et al. 2010), and Kraken (Davis et al. 2013). Another method to calculate species diversity is through the use of statistical estimators, in particular nonparametric estimators. Simpson’s index (Simpson 1949) is based on the probability of the same species taken randomly from the community and is used to assign two independent subjects. The Shannon–Wiener index H´ (Shannon 1948) is an entropy measurement and is directly proportional to the number of species in the sample. These methods are used for heterogeneity measurements and differ primarily in calculating the taxa abundance to measure the final richness estimation (Escobar-Zepeda et al. 2015). Simpson and Shannon–Wiener indices prioritize more-frequent and rare species, respectively, in the sample (Krebs 2014).

The use of diversity indices which quantify and compare microbial diversity among samples is a better approach as compared to ones based on molecular markers. The species diversity analysis should be done carefully as it can be uninformative. The biases related to sampling should be reduced considering the criteria for species or OTU definition.

5.4 Comparative Metagenomics

The comparison between two or more metagenomes facilitates the understanding of genomic differences and how they are affected by the abiotic environment. Various sequence-based traits such as GC content (Yooseph et al. 2007), microbial genome size (Raes et al. 2007), taxonomy (von Mering et al. 2007), and functional content (Turnbaugh et al. 2006) have been compared to gather biological insights through comparison between two or more metagenomes. Statistical analysis is a necessity to analyze several metagenomics datasets, and principal component analysis (PCA) and nonmetric multidimensional scaling (NM-MDS) have been used to visualize the metagenomics data analysis and reveal major factors that affect the data most (Brulc et al. 2009).

5.5 Challenges in Metagenomics Analysis

Sequencing of a complex environmental community for metagenomics analysis often represents only a minute fraction of the vast number of culturable and unculturable microorganisms actually present (Desai et al. 2012). To obtain just onefold coverage of the entire community in a gram of soil requires hundreds of millions of reads without guarantee that every member of that community was sequenced. The unknown community composition and relative abundance of microorganisms limits our ability to calculate the coverage robustly. Even perfect 16S amplicon-based characterization of microbial species fails to distinguish between different strains (Desai et al. 2012). Furthermore, no tools are available that determine the availability of sufficient coverage to interpret data of a certain depth for a community. The low coverage data represents randomly subsampled genomic content of the community. Despite complete coverage with millions invested, the analysis of metagenomics data requires tools and protocol development comparable to genomic analysis. Moreover, if the approaches led to the identification of new microbial community members and discovery of new molecules, problems associated with cloning biases, sampling biases, misidentification of “decorating enzymes” and incorrect promoter sites in genomes, and dispersion of genes involved in secondary metabolite production (Escobar-Zepeda et al. 2015) should be considered.

Similarly, human metagenomic experiments and analysis also have associated limitations and pitfalls as they are sensitive to the environment including any particular condition or intervention (Kim et al. 2017). Various factors including diet, drugs, age, geography, and sex have all been reported to influence function and composition of the human microbiome (Blaser et al. 2013;Dave et al. 2012; Lozupone et al. 2012). Another challenge is the longitudinal stability. Unlike gut, the microbiome of other sites, like the human vagina, can vary in short periods without always indicating dysbiosis (Williams and Lin 1971). In animal experiments, the prime limitation is the cage effect, which is best studied in mice kept in the same cage and can share the same microbiome because of coprophagia (Campbell et al. 2012). When it comes to handling and analyzing samples, issues pertaining to low microbial biomass, environmental contamination, and presence of negative/positive control samples should be addressed. The major informatics challenges associated with human metagenome analysis, similar to other metagenomes, are the large volume and bulkiness of the data and the heterogeneous microbial community. One additional challenge has been the rapid identification of host sequences contaminating metagenomics datasets, which is time- and memory-extensive process and hence needs to be revisited. There have been efforts to overcome these challenges with tools like CS-SCORE (Haque et al. 2015); however, algorithm improvement is needed.

5.6 Applications of Metagenomics

5.6.1 Correlations Between Environmental Data and Metadata

Metagenomics studies aid in investigating genomic potential of the bacterial community and how it is affected by and is affecting its habitat. The correlation between sequence data, environment, and environmental attributes or their correlation among themselves reveals new biological insights. For example, a bivariate metagenome study in obese vs lean mouse reveals that obese individuals are enriched in carbohydrate-active enzymes (Turnbaugh et al. 2006). Multivariate correlation analysis in a nutrient poor ocean habitat revealed covariation in amino acid transport and cofactor synthesis molecules (Gianoulis et al. 2009).

5.6.2 Investigating Symbiosis

Symbiotic relationships occur when two or more organisms are symbionts which represent a small-scale metagenomics and can be analyzed in a similar fashion. The organisms in symbiotic relations are few, and their distance to each other phylogenetically eases the binning of the reads in separate bins and can be assembled separately. Wu and collegues (2006) exploited a similar method to bin the ESS data from bacterial symbionts living in the glassy-winged sharpshooter and inferred that one member of a symbiont synthesizes amino acids for the host insect, while the other produces cofactors and vitamins (Wu et al. 2006).

5.6.3 Gene Family Enrichment

The immense amount of genetic material has led to the possibility of associating new gene families with new members of existing gene families. The small bacterial eukaryotic protein kinase-like (ELK) gene family was enriched severalfold through the Global Ocean Sampling (GOS) metagenomics project (Wooley et al. 2010).

5.6.4 Human Microbiome

Symbiotic microbes have coevolved with humans for millions of years and play a critical role in health of the host. The focus of human microbiome research has been on the bacteria residing in the gut, which represents the most abundant and diverse part of the human microbiome (Consortium 2012). Colonization of these bacteria commences at birth, and the method of delivery (i.e., vaginal or cesarean section) influences the basal community (Dominguez-Bello et al. 2010). Early-life events, such as mode of delivery (Fig. 5.2 – adapted from Rutayisire et al. 2016), dietary transitions or restrictions (Bergstrom et al. 2014; Rutayisire et al. 2016), and antibiotic use (Cho et al. 2012), shape the dynamic microbiome of infants. This gradually stabilizes with age and leads to adult gut microbiota, which is highly resilient to minor perturbations. This longitudinal stability, collectively with vast interpersonal diversity of the microbiome, allows identification of ~80% individuals by their distinct “microbial fingerprint” (Franzosa et al. 2015). The human microbiota communities contribute to various host biological processes, thus deeply influencing human health. Global initiatives have been taken to understand the healthy microbiome and its composition.

Fig. 5.2
figure 2

Microbiota colonization pattern significantly associated with the mode of delivery during the first 7 days after birth. Bacterial species with quantified colonization rate has been shown. (Adapted from Rutayisire et al. 2016)

5.6.5 Metagenomics in Diseases

Recent findings have emphasized the effect of gut microbiome in human health and therapeutic response (Scarpellini et al. 2015). The gut microbiome, primarily, is composed of viruses and fungi and has been shown to be modulated in diet-associated insulin resistance in type 2 diabetic patients using a metagenome-wide association analysis (Qin et al. 2012). Gut microbiota has been established as a metformin action site, and metformin–microbiota interactions have been studied to show that altered gut microbiota mediates some of metformin’s antidiabetic effects (Wu et al. 2017). The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) is an excellent source of highly curated, searchable, metagenomic resource focusing on facilitating the investigation of human gastrointestinal microbiota (Forster et al. 2016).

Historically, cancer has been associated with different forms of microorganisms. The metagenomics era has revolutionized microbiome profiling which helps to boost a number of studies exploring microbial linkage to cancer. Several studies on microbes and cancers have shown distinct associations between various viruses and different types of cancers. Human papilloma virus (HPV) causes the majority of cervical, anal, and oropharyngeal cancer (Chaturvedi et al. 2011; Daling et al. 2004; Gillison et al. 2008; Winer et al. 2006). Similarly, Epstein–Barr virus has been found to be responsible for nasopharyngeal carcinoma, Hodgkin’s, Burkitt’s lymphoma, etc. (Anagnostopoulos et al. 1989; Henle and Henle 1976; Leung et al. 2014).

5.6.6 Clinical Implications

In translating the role of microbiomes into clinical applications, Danino et al. (2015) engineered a probiotic E. coli to harbor specific gene circuits that produce signals allowing detection of tumor in urine, in case of liver metastases. This concept was based on the fact that metastasis leads to translocation of the probiotic E. coli to the liver. Metagenomics has also allowed physicians to probe complex phenotypes such as microbial dysbiosis with intestinal disorders (Antharam et al. 2013) and disruptions of the skin microbiome that may be associated with skin disorders (Weyrich et al. 2015). Recently, different bacterial profiles in the breast were observed between healthy women and breast cancer patients. Interestingly, higher abundances of DNA damage causing bacteria were detected in breast cancer patients, along with decrease in some lactic acid bacteria, known for their beneficial health effects (Urbaniak et al. 2016). Such studies raise important questions regarding the role of the mammary microbiome in risk assessment to develop breast cancer.

Metagenomics analytics is changing rapidly with evolutions of tools and analysis procedures in terms of scalability, sensitivity, and performance. The field allows us to discover new genes, proteins, and the genomes of non-cultivable organisms with better accuracy and less time as compared to classical microbiology or molecular methods. However, no standard tool or method is available that can answer all our questions in metagenomics. The lack of standards reduces reproducibility and is still a case by case study. The major problem associated with metagenomics study is also data management as most institutes lack computational infrastructure to deal with long-term storage of raw, intermediate data, and final analyzed datasets.

Comparison between different biomes and different environmental locations will provide insight into the microflora distribution and help understand the environment around us.

All the advances in the field of human metagenomics add up to the profound impact that the microbiome and their metagenomics have on human health in providing new diagnostic and therapeutic opportunities. However, existing therapeutic approaches for modulating microbiomes in the clinic remain relatively underdeveloped. More studies focused on metagenomics of different organs need to be performed, comparing the tissues from healthy versus affected individuals. Further exploration of additive, subtractive, or modulatory strategies affecting the human microbiota and its clinical implementation could potentially be the next big milestone in the field of translational and applied microbiology. The near future challenge is in the accurate manipulation and analysis of the vast amounts of data and to develop approaches to interpret data in a more integrative way that will reflect the biodiversity present in our world. The development of more bioinformatics tools for metagenomics analysis is necessary, but the expertise of scientific community to manipulate such tools and interpret their results is a critical parameter for successful metagenomics studies.