Keywords

5.1 Introduction

The invention of DNA sequencing has been about 42 years, and the technologies are continuing to advance in the upcoming eras. Due to the continuous development of DNA sequencing technology and understanding of the genome, it has established comparatively a novel scientific discipline as genomics. For studies of genomics, purification of DNA is an essential step to further manipulation, including sequencing, restriction digestion, ligation, transformation, mutagenesis, and construction of probes (for hybridization). The utilization of different cloning vectors (M13mp or pUC) in Escherichia coli (E. coli) to increase the copy number (amplification) of DNA is very useful (Saiki et al. 1988). In addition, bacteriophage such as M13 is also very potential for these purposes (Messing 2001). A technique for the semi-automation of DNA sequence evaluation has been established, during an initial improvement of DNA sequencing. For each of the reactions particular for the bases A, C, G, and T, a different colored fluorophore has been used (Smith et al. 1986).

The whole-genome random sequencing and assembly have been utilized to determine the entire genome sequence (580,070 base pairs) of the Mycoplasma genitalium. It was the minimum known genome of any free-living organism. After comparative evaluation of this genome with Haemophilus influenzae, it has been suggested that the variances in genome content reflected as great differences in metabolic capacity and physiology of these two microbes (Fraser et al. 1995; Fleischmann et al. 1995). Enormously comparable sequencing has been established as novel and changing the hypothesis of genomics helps to understand the biology of organisms at a genome level. Due to the rapid development of genomics, it has revolutionized the biomedical research and clinical medicine (Shendure et al. 2017). Remarkably, DNA sequencing has been utilized in like prognosis (mutational status), diagnosis (DNA and RNA-based), treatment (therapeutic identification), and molecular phylogeny (Marian 2011; Koboldt et al. 2013; Smith 2017; Hu et al. 2019; Won et al. 2019).

Recently, for bacteria, algae, fungi, and protozoa, a DNA barcode has been developed. Remarkably, it was associated with error-free and rapid species identification that supported in understanding the microbial species which involve in a particular disease and microbial diversity (Chakraborty et al. 2014). Similarly, next-generation sequencing has facilitated the clinical metagenomic (culture-independent approach) and plays a vital role in research laboratories to clinical applications. This developing technology helps to vary diagnosis and treatments of communicable and noncommunicable (like cancer, diabetes, heart attack, etc.) diseases (Chiu and Miller 2019). In the oncology discipline, methylated-DNA sequencing tools are yielding great amounts of methylome records from cancer samples, from which cancer-associated differentially methylated CpG sites (cDMCs) have constantly recognized and filed. The addition of as numerous cDMCs as likely helps advance the precision of cancer examination and occasionally identify cancer subtypes. Nevertheless, the absence of a well-known technique for the analysis of 100 s of cDMCs normally hinders their vigorous practice in treatment (Jeon et al. 2017). Moreover, high-throughput sequencing (HTS) is progressively essential in defining cancer diagnoses, with subsequent prognostic and therapeutic implications (Guillermin et al. 2018). Presently, NGS plays a significant role to understand the gene expression in normal and cancer cells (Craig et al. 2016). It was possible because of the development of NGS technologies.

Recently, Oxford Nanopore sequence technology, MinION, has been developed, and it has improved the other technical requirements and is portable for onsite sequencing. This technology has been applicable in various fields like clinical medicine, environmental sphere, and biosecurity (Runtuwene et al. 2019). In contrast, for quantitative gene expression analyses, the RNA-Seq technology has been used as a widespread methodology. Further, it has also helped in the annotation of the transcriptome in gene expression study under various conditions (Blow 2009; Roberts et al. 2011; Garalde et al. 2016). Nevertheless, precise gene expression estimation needs exact genome information. Due to the development of technology from conventional Sanger method (Fig. 5.1a) to next-generation sequencing (NGS) techniques, it has significantly helped to understand genomics, transcriptomics, metagenomics, and metatranscriptomics Fig. 5.1.

Fig. 5.1
figure 1

DNA sequencing technologies. (a) Schematic examples of first, (b) second and (c) third generation sequencing are shown. Second generation sequencing is also referred to as next-generation sequencing (NGS) in the text (Shendure et al. 2017. Adopted with permission)

5.2 Next-Generation Sequencing

Modern DNA sequencing technology is commonly known with different names, such as next-generation sequencing (NGS), deep sequencing, enormously comparable, second generation, or third generation (Fig. 5.1b, c). However, the previous Sanger sequencing technology is considered as the conventional or the first generation sequencing, which has taken a decade to complete the final draft of the human genome. Presently, NGS is mostly utilized in the research work (Behjati and Tarpey 2013). An alternate method of polymerase chain reaction (PCR) colonies, or polonies, has been used to amplify for a single template molecule. At a time, it can amplify the millions of clones (Mitra and Church 1999).

Recently, the proliferation of genome sequencing projects has encouraged a search for alternate approaches to decrease time and expense. Toward accomplishing about 100-fold expansion in throughput over Sanger sequencing technology, it has established an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing procedure enhanced for solid support and picoliter-scale capacities. The pyrosequencing technique has demonstrated the efficiency, accuracy, throughput, and robustness of the system through de novo assembly and shotgun sequencing of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy (Margulies et al. 2005). Currently, the virome analysis relies on deep sequencing, NGS data, and nucleic acid databases like the National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), and European Molecular Biology Laboratory Report (EMBL) databases. There are two frequently used NGS platforms: Illumina and Ion Torrent, which advocate maximum fragment lengths ( 300 to 400 nucleotides) for assessment separately. Recently, NGS technology has been enhanced with real-time sequencing at single-molecule level (third generation) (Fig. 5.1c). It has indicated that the sequencing at single-molecule level will help in the advancement of various fields of biology, including virology, cancer biology, metagenomics, transcriptomics, and bioremediation (Ramamurthy et al. 2017; Schloss and Handelsman 2004; Bharagava et al. 2019; Jaswal et al. 2019; Costeira et al. 2019; Goto et al. 2019).

The advancement in DNA sequencing technology, NGS method, has supported to understand the microbiome diversity and also its characterization at a molecular level (Zaura 2012; Cao et al. 2017). The software developed for conventional sequencing technologies is becomes frequently incompetent to agree with nature of NGS technologies, which produce small and massively parallel reads. It has shown an amplification-free approach for analysis of the nucleotide sequences of about 0.3 million individual DNA molecules concurrently (Stein 2011; Harris et al. 2008). These findings have helped to understand the comparative analysis of human genomics.

5.3 Application of NGS in Various Fields

The NGS produces an amazing vision into metagenomes, metatranscriptomes, and metabarcodes of various organisms such as viruses, archaea, bacteria, and eukaryotes (Faure and Joly 2015; Bruno et al. 2015). The NGS tools have established the competence to sequence DNA at a unique speed, thereby allowing previously thought inconceivable scientific attainments and new biological applications. However, a significant challenge for data storage, analyses, and management solutions has been arising due to enormous data generated by the NGS (Zhang et al. 2011). Innovative bioinformatics tools are important for the effective management of NGS data. Through exploiting NGS approaches, investigators have identified and analyzed the important genes (Bai et al. 2012).

Multiplex ligation-dependent probe amplification and Sanger sequencing have been utilized for the genetic analysis of BReast CAncer genes 1 and 2 (BRCA1/2) which include the assessment for single nucleotide variants and insertion/deletion and for larger copy number variations (CNVs). Due to the introduction of NGS, it has become possible to specify the CNV information and sequence data (Schmidt et al. 2017). On the other hand, the NGS technique has been utilized to understand the molecular mechanism involved in gene regulation in hypertension. A remarkable study has identified the several genes’ loci associated with different cardiovascular diseases (Costa and Franco 2017). The influence of NGS technologies on genomics will be very useful in the understanding of such type of complex diseases.

With the introduction of NGS tools, genome sequencing has become reasonable for regular genetic analyses. It has much facilitated the understanding of pathways involved in disease progression and analyses of rare genetic information of complex traits from the large datasets (Weissenkampen et al. 2019). The NGS has significantly extended our knowledge and skill to identify and characterize the gene and genetic composition from the microbial communities. Moreover, it has facilitated the analysis of microbiome with the help of metagenomics (culture-independent) approaches (Song et al. 2013). In the metagenomics, total genetic material is extracted from total communities and processed and sequenced simultaneously.

Interestingly, NGS has revealed the metabolic interaction between gut microbiome and host. Genome-scale modeling is an evolving method which has established the various relations (microbe–microbe, host–microbe) under biological environments (Sen and Orešič 2019). However, the mechanisms of these interactions are still unclear. Gradual advancement of NGS tools has helped to understand the transcriptomics and metatranscriptomics (Tarkkonen et al. 2017). Furthermore, RNA-Seq data allow the analysis of transcriptome in the absence of a reference genome (de novo assembly) (Saggese et al. 2018). Due to large datasets generated from NGS tools, sophisticated in silico tools and skilled person are required.

Soil microbial communities are directly affected through natural environmental conditions, and functions are also fluctuating (Barboza et al. 2018). Taxonomic and functional profiles of soil samples can be analyzed by NGS approaches. On the other hand, bioremediation is usually observed as one of the effective methods to clean the environment with the help of microorganisms, instead of conventional physical and chemical methods. Due to an emerging concept of metagenomics, it can be utilized to understand the active microbial species, beneficial genes, enzymes, and bioactive molecules from the particular environmental sample. Such microbial species or specific gene can be utilized for effective bioremediation of the particular biohazardous compounds (Marco 2008; Ju and Zhang 2015; Czaplicki and Gunsch 2016; Conrads and Abdelbary 2019).

In addition, the series of omics, like metagenomics, metatranscriptomics, metaproteomics, metabolomics, and fluxomics, is also being utilized in the characterization, identification, and selection of particular strain of microbes (Schloss and Handelsman 2004; Bharagava et al. 2019). Therefore, a multi-omics approach, like metagenomics, metatranscriptomics, metaproteomics, and metabolomics, provides an excellent way to understand the metabolic pathway and microbes, which are involved in the bioremediation of particular contaminants in the environmental sites. Utilizing these approaches in the establishment of microbial consortium may be useful to provide specific microbial strain for the degradation of a particular contaminant from the environments.

5.4 Challenges of DNA Sequencing

Advancement in DNA sequencing technology like NGS and availability of international nucleotide sequence database collaboration (INSDC), and in silico tools (Softwares) significantly help to rapidly generate the genome sequences and understand the functional genomics of any organism. However, the analysis of NGS data requires sophisticated in silico tools, expensive infrastructure, and skilled person (Iacoangeli et al. 2019). The interpretation and characteristic of the NGS data have been challenged at several steps; such as sequencing errors, storage, algorithm, and statistical analyses (Zaura 2012).

In the recent past, rapid technological developments directed by academic institutions and companies are continuous to extend NGS methods from basic research to the clinical applications. However, the NGS implementation offers various process such as sequence analysis, storage, and quality control (Xuan et al. 2013; Vallenet et al. 2017). Genome-wide association studies (GWAS) have developed an applied technique to identify the genetic loci associated with disease by examining numerous markers throughout the genome. Ultimately, NGS has gained increasing popularity in the current years through its ability to analyze a much larger number of markers throughout the genome. Though NGS platforms have accomplished examining a higher number of single nucleotide polymorphisms (SNPs) associated with GWA studies (GWAS) (Alonso et al. 2015). In the case of GWAS, trivial effects started large sample sizes, usually made possible through meta-analysis by exchanging summary statistics throughout consortia. While NGS studies groupwise test for the association of multiple potentially causal alleles by every gene. Therefore, they have developed MetaSeq, a procedure for meta-analysis of genome-wide sequencing data, and it is publicly available as open source (Singh et al. 2013). The results gained by the NGS need thorough analysis, as their biological relationship is not well understood.

Numerous emerging biological applications, such as targeted exome sequencing, chromatin immunoprecipitation sequencing (ChIP-Seq), and whole transcriptome shotgun sequencing technology or RNA-Seq, have been established to accomplish various biological determinations. Exome sequencing (Mamanova et al. 2010) affects the disadvantage of the high expense of sequencing the whole genome without intronic regions, and selectively sequencing the exonic regions, which might be of further direct interest. ChIP-Seq (Johnson et al. 2007) is used to study protein–DNA/RNA interactions while RNA-Seq (Mortazavi et al. 2008) is used to exploit the NGS technologies to sequence cDNAs. The gathering of NGS reads leaves over a puzzling assignment. It is particularly true for the assembly of environmental samples that originate through metagenomics approaches possibly containing huge microbial diversity (Warnke-Sommer and Ali 2016). Nevertheless, due to lack of availability of the reference genome, it creates problems while executing the conclusion of analyzed data (Hiendleder et al. 2005). Moreover, NGS has facilitated the understanding of microbiomes associated with infection, environmental, bioremediation, and diversity of agriculturally important microorganisms. Recently, solid-state nanopore-based NGS has been developed. It has demonstrated that advancement in the technology, such as the processability, the robustness, and the large-scale integratability (Goto et al. 2019) and other sequencing platforms, is under development.

5.5 Concluding Remarks

This chapter analyzes an innovation in the DNA sequencing technology, like second and third generation sequencing techniques, which has demonstrated the efficiency, accuracy, throughput, and robustness of the system. In addition, it is significantly helps to generate rapid and large datasets, like metagenome and metatranscriptome. Moreover, the availability of various biological databases, such as NCBI-GenBank, DDBJ, EMBL-EBI (European Bioinformatics Institute), and in silico tools (Softwares), considerably helps to understand the functional genomics of the organisms. Exploiting the NGS technology and databases in the field of microbiology, metagenomics molecular method is one of the powerful culture-independent approaches to understand the microbial diversity and mine a specific gene from any sample such as environmental, human gut, and rumen. The advantages of NGS are the high throughput, low cost, and accuracy of the data and exponentially support the NGS datasets. However, there are sequencing artifacts (low-quality and contaminated reads) that need to be tackled when using the NGS analysis. A major DNA sequencing method, with high precision, extended read length, and high throughput, would be necessary for further developments of fields.