Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Third-Generation Sequencing Methods

Despite the advantages of next-generation sequencing methods, soaring expectations in the field have driven the demand for even better technologies (see Chap. 5). Therefore, a new staple of sequencing methods known as third-generation sequencing or next-generation sequencing is being developed in the hopes of elevating the platform to a whole new dimension [1]. Ideally, third-generation sequencing methodology should reduce or eliminate some or all of the three main challenges faced by the next-generation techniques, i.e., excessive machine costs, short read lengths, and significant error rate. To date, three methods have been introduced that can be considered as third-generation methods or in the transitionary phase between the next-generation and third-generation tools.

1.1 Heliscope Single-Molecule Sequencing

Heliscope Single-Molecule Sequencing (or Helicos Single-Molecule Fluorescent Sequencing) is the first single-molecule sequencing (SMS) method that can directly identify the exact sequence of a given DNA stretch [2]. In this technique, the DNA to be sequenced is sheared and the resulting fragments are then attached to Poly-A tails, which allow the fragments to be connected to a flow cell surface. A single type of fluorescently labeled nucleotide is added in cycles to extend the DNA by one nucleotide per cycle. After the addition of each nucleotide, the reaction is paused using a terminating nucleotide in order to capture an image of the florescent label. Subsequently, the flow cell surface is washed and the blocking is removed to repeat the cycle [3]. This technology was developed by Helicos Biosciences and was used in 2009 to sequence whole human genome (the genome of Stephen Quake, Professor of Stanford University, USA and a co-founder of Helicos BioSciences) for less than 50,000 dollars [2]. It was also used to sequence the genome of the M13 bacteriophage [4]. However, by the end of 2012, Helicos BioSciences shut its doors and filed for bankruptcy.

1.2 Single-Molecule Real-Time Sequencing

The single-molecule real-time (SMRT) sequencing technique is another SMS method that is based on the principle of sequencing by synthesis. It utilizes small well-like containers with a single DNA polymerase enzyme affixed at the bottom of a structure called the zero-mode waveguide (ZMW) [5]. Each ZMW contains a polymerase enzyme and a DNA fragment as a template, and creates an observation volume that is sufficiently illuminated to view a single nucleotide when being incorporated by DNA polymerase. This observation is accomplished through capturing the florescent label of the incorporated nucleotide by a detector [6]. The SMRT Sequencing technology was developed by Pacific Biosciences and is currently implemented in their commercial sequencing machines, where the actual sequencing is fulfilled on a chip that contains several ZMVs (see below).

1.3 Nanopore Sequencing

The Nanopore sequencing method was first introduced in the middle of the 1990s as a technique for determining the nucleotide order in a DNA sequence [7]. The technique is based on the utilization of a surface comprising of 1 nm diameter pores. The passage of DNA through a pore alters its ion current. This effect is indicative of the types of nucleotides present as current changes depend on the shape, size, and length of the DNA molecules being sequenced. Thus, each nucleotide can be identified based on its corresponding ion blockage time. Nanopore sequencing is a promising and low-cost method that does not require modified nucleotides, chemical labeling, or PCR amplification [8].

The major challenge of utilizing the nanopore method is the preparation involved in developing the nanopore surface, which can be either solid-state nanopore surfaces or protein-based nanopore surfaces. Solid-state surfaces are used in solid-state nanopore sequencing techniques such as sequencing with florescent labels [9]. On the other hand, protein-based nanopore sequencing employs proteins such as Alpha hemolysin and Mycobacterium smegmatis porin A (MspA) as nanopore surfaces [1012]. Nanopore sequencing is still in the developmental stages, and thus far have not been commercially available [13, 14].

2 Third-Generation Sequencing Platforms

2.1 HeliScope Single-Molecule Sequencer

The Heliscope Single-Molecule Sequencer was the first commercialized SMS developed by Helicos Biosciences in 2009. It implements the Heliscope SMS technology that was developed by the same company and represents a revolutionary sequencing paradigm that allows the sequencing of about one billion molecules in about 7 days, a rate 1,000-fold over the technology available when first released [2]. It uses novel reagents that allow digital measurement of homopolymer sequences as well as a new alignment algorithm to perform whole genome assembly (reference-based assembly). The sequencer reads are between 24 and 70 bp, which are very short based on previous expectations from a third-generation product. However, the higher speed of sequencing and lower associated costs are the significant strengths of the platform.

The Heliscope Single-Molecule Sequencer was used to sequence the genome of one of the co-founders of Helicos Biosciences (referred to as Patient Zero or P0 in the published article), with promising results [2]. Four sequencers were used to sequence the whole human genome and the results were mapped to ~90 % of the reference genome with a coverage depth near a Poisson distribution [2]. However, Helicos Biosciences closed down at the end of 2012 and, therefore, the Heliscope Single-Molecule Sequencer was excluded from comparisons in this chapter.

2.2 PacBio RS II

PacBio RS is a DNA sequencing system developed by Pacific Biosciences. The PacBio RS systems (PacBio RS and PacBio RS II) are single-molecule sequencers that implement the SMRT sequencing technology developed by the same company. These can be considered as genuine third-generation sequencers with a read length that is >3,000 bp, which is one of the longest available read lengths to date. The sequencer is compact with a short run time (~10 h). However, it is very expensive and still suffers from high error rates and a low total number of reads per run (Tables 6.1, 6.2, 6.3, and 6.4) [13, 14].

Table 6.1 Comparison of major third-generation sequencers advantages and disadvantagesa
Table 6.2 Comparison of major third-generation sequencers run time, read length, and output dataa
Table 6.3 Comparison of major third-generation sequencers purchase and operation costsa
Table 6.4 Comparison of major next-generation sequencers errors and error ratesa

2.3 Oxford Nanopore GridION

The Oxford Nanopore GridION sequencers are sequencing machines that implement the Nanopore sequencing methodology. The sequencers are being developed by Oxford Nanopore Technologies Ltd. (UK), which had originally announced that their first commercialized instrument would be available by the end of 2013 [15]. However, at the time of manuscript preparation, it had not yet launched.

The Oxford Nanopore GridION systems promise small, inexpensive and high-throughput sequencers with an unprecedented long read length of ~10,000 bp. According to the product page on the company website [15], the Oxford Nanopore GridION can be used as a single desktop machine or stacked in racks in a similar manner to computer servers. Furthermore, it is stated that the instrument does not require a dedicated server and utilizes a single-use disposable cartridge that contains all the reagents necessary for the experiment. The available information on the performance of the Oxford Nanopore GridION systems shows a relatively high error rate (~4 %), though this rate does not rise upon increasing the read length [13, 14].

3 Sequencing Methods Under Development

We have previously discussed the rapid rate at which methodology has been developed in the DNA sequencing field, and how this fact has helped alleviate prior technical challenges. Moreover, several additional methods are currently in development and hold the promise of making DNA sequencing cheaper, easier, faster, and more accurate. The ultimate goal of these developments is to make whole human genome DNA sequencing as simple and affordable as other standard laboratory procedures. This would allow its widespread utilization towards innumerable clinical applications such as personalized medicine, and would augment research to unprecedented levels [16]. In this section, we will discuss methodologies that are presently in the developmental phase as well as their expected outcomes.

3.1 Solution-Based Hybridization Sequencing

The idea behind sequencing by hybridization is not a new one and has been previously presented [17]. Sequencing by hybridization involves a nonenzymatic approach based on the creation of a hybrid between the DNA molecule of interest and another molecule of known sequence. When one short strand of DNA binds to its complementary strand, the binding become very sensitive to mismatches, even at the level of a single-base. Thus, the sequence of the complementary strand can be inferred from the sequence of its hybrid. The method requires a library of DNA probes (short single-stranded DNA sequences) based on the organism of interest, its variants or its single-base variations, and can be accomplished using DNA chips or microarrays [17]. The technique has several advantages including homogenous coverage, though the preliminary requirement of DNA and the need for a significant amount of chemicals limit its overall utility. However, the recent introduction of solution-based hybridization has drastically reduced the dependency on chemicals and expensive equipment [18, 19].

3.2 Tunneling Current DNA Sequencing

The novel approach of identifying a DNA sequence and differentiating between the four types of nucleotides through the use of electrical signals was first presented via nanopore sequencing [8]. Based on these findings, the Tunneling Current DNA Sequencing method identifies specific nucleotides through tunneling current conducted by single-base molecules as they pass through a channel comprising of a pair of nanoelectrodes [10, 20, 21]. The differing structures of the nucleotides have varied effects on the current during this process. Thus, differentiating between them is possible through the identification of the characteristic changes in the current influenced by each nucleotide. A recent report also presented a hybrid method that combined single-base electrical identification and random sequencing to allow successful sequence reads from nine different DNA oligomers and microRNA [21]. The method promises an elevated sequencing speed in comparison to those currently available.

3.3 Microscopy-Based DNA Sequencing

Microscopy-based DNA Sequencing utilizes an electron microscope to directly visualize the nucleotide sequence of intact DNA molecules. In this approach, nucleotides are enzymatically modified to contain atoms with higher atomic number that can be directly visualized and identified by the electron microscope. Using this technique, an intact synthetic molecule of length >3,200 bp and an intact viral DNA of length >7,000 bp were sequenced successfully, proving the potential of this methodology in the sequencing of long intact DNA molecules [22].

3.4 Mass Spectrometry-Based DNA Sequencing

Mass Spectrometry is well known as the technology of choice in the study of proteins and the identification of amino acid sequences [23]. Additionally, it is utilized in the study of metabolites via the capillary electrophoresis mass spectrometry (CE-MS) approach [24]. For the purposes of DNA sequencing, electrospray ionization time-of-flight mass spectrometry (ESI-TOF MS) and matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) were used to determine the nucleotide sequence of DNA through the examination of nucleotide mass. This contrasted with previous methodology that employed the study of nucleotide size, structure, florescent labeling, or radioactive labeling [25, 26]. Since each type of nucleotide has its own unique chemical structure, each of them also possesses a unique mass. Therefore, spectrometry can be used to identify the nucleotide sequences accurately and in high resolution. This method was found to be more effective with RNA, so the DNA is converted to RNA prior to the sequencing process. An early attempt to use MS for DNA sequencing showed that the longest read in the procedure could be 100 bp [27]. In more recent studies, MS-based DNA sequencing has been used to identify SNPs in pathogens [26] and the comparison of human mitochondrial DNA with DNA from the bones of dead soldiers during a forensic investigation [28].

3.5 RNA Polymerase Sequencing

RNA polymerase (RNAP) Sequencing involves the utilization of an RNAP enzyme that is attached to a polystyrene bead while the DNA molecule to be sequenced is attached to another bead, following which the two beads are placed in optical traps. The sequencing information is obtained from the movement of the nucleic acid enzyme and the sensitivity of the optical trap. During transcription, the motion of the RNAP brings the two beads closer, which can be recorded in single nucleotide resolution (in Angstrom range). The differentiation between the four types of nucleotides is then accomplished using a Sanger approach-like method. The concentration displacement of the four types of nucleotides over the transcription time is compared and used to pinpoint the specific types of the nucleotides in the sequence [29, 30].

In addition to the above, several other sequencing methods and instruments are currently either in the research phase or at the initial stages of commercialization. These include in vitro virus high-throughput sequencing [31] and microfluidic Sanger sequencing [32], for instance. However, due to text limitations, it is not possible to discuss them all within the confines of this book. Reports that survey or compare upcoming methods and platforms are readily available [13, 30], though the rapid pace of the field necessitates sources that are frequently updated such as the NGS Field Guide [33].