Keywords

11.1 Introduction

Genetic variation is one of the reasons why individuals have different responses to the same drug [1]. For instance, CYP2C19 gene polymorphism (rs4244285) affects the metabolism of clopidogrel [2] and the proton pump inhibitor omeprazole [3], thereby affecting their efficacy. Besides gene polymorphism, some somatic mutations have become important markers for guiding the use of targeted anti-tumor drugs for individual patients [4]. With the wide application of next generation sequencing technology, more and more drug-related gene mutations were found, and some genetic variants have become effective targets for individualized drugs in clinical applications [5], promoting the advances of precision medicine. Consequently, accurate and reliable tools for detecting genetic variations are required both in scientific research and clinical applications. Currently there are various kinds of methods based on different principles available for genotyping in pharmacogenomics, such as allele-specific probe hybridization, allele-specific primer extension, allele-specific enzyme digestion and ligation. Also various detection platforms have been developed for the corresponding methods. Here we briefly introduce some typical genotyping technologies in the case of principles and corresponding detection platforms.

11.2 Principles of Genotyping Technologies

11.2.1 Allele-Specific Hybridization

Hybridization is an intrinsic feature of DNA. Two oligonucleotide fragments can be hybridized into a double-stranded nucleic acid fragment by base pairing under the conditions of suitable ionic strength and an appropriate temperature (Fig. 11.1). This is a reversible process when the temperature rises to a certain extent, the two oligonucleotide strands are denatured into single-stranded nucleic acid fragments. The temperature, at which half of the double-strand DNA denatured, is defined as melting temperature (Tm) of the DNA. The Tm is mainly dependent on the ionic strength of the solution, the concentration of the oligonucleotide, and the sequences of the oligonucleotide. Commonly, one-base mismatched DNA has a lower Tm than the complementary DNA, thus it is difficult to form duplex for the mismatched DNA at Tm of the complementary DNA due to the kinetic instability. Therefore, the single-base difference between alleles in SNPs could be distinguished on a suitable temperature via detecting hybridization behaviors. Many strategies were proposed for monitoring the hybridization behaviors, such as measuring fluorescence signal changes of double-stranded chimeric fluorescent dyes [6] and fluorescent probe hybridization [7]. Accordingly, many technology platforms based on the principle of allele-specific hybridization were developed, such as high resolution melting curve analysis (HRM) [8], SNP array [9], molecular beacon-based SNP detection [10]. The key point of allele-specific hybridization-based technology is how to ensure the high specificity of the detection. Although single-base difference in nucleic acid sequences results in different Tm values, in some cases the difference in Tm values is so small that it is difficult to ensure the specificity of hybridization. Therefore, the detection conditions should be carefully optimized and precisely controlled, and sometimes the additional additives (dimethyl sulfoxide, betaine, formamide, etc.) or special modifications (locked nucleic acid, peptide nucleic acid, etc.) to the probe are required to improve the specificity of hybridization, which increases the complexity of the experimental setup.

Fig. 11.1
figure 1

Hybridization with complementary DNA or mismatched DNA

11.2.2 Endonuclease Digestion

An endonuclease is an enzyme that can cleave the nucleotide chain into two parts. Endonucleases usually recognize certain sequences in the nucleotide chain or some special structure formed by oligonucleotides. The specificity of some endonuclease is high enough to distinguish one-base difference in the recognition sequences or structures, so we can use those endonucleases to develop SNP detection assay. The typical genotyping methods based on endonuclease are Invader assay and restriction fragment length polymorphism (RFLP) technology.

Invader assay is depending on flap endonuclease 1 (FEN1) to recognize an invasive structure, which is formed by an upstream probe invading one base into the double-strand region of downstream probe hybridizing to a target DNA (as shown in Fig. 11.2), and cut the 5′-flap fragment from the downstream probe. Because the Tm value of the downstream probes is close to the reaction temperature, the cleaved downstream probe will dissociate from the template and an intact downstream probe will again hybridize with the template to form an invasive structure, triggering a cycle of cleavage. The cleaved flaps can be captured by hairpin fluorescence reporter probe to form invasive structure again, which is recognized by FEN 1, leading to the cleavage of 5′-end of the hairpin probe. The hairpin probe is labeled with a fluorescence reporter molecule and a fluorescence quenching group at the 5′ area. After cleavage, the fluorescent group on the hairpin probe is separated from the quenching group, generating fluorescence signal. Since the Tm value of flap is designed to be close to the reaction temperature, each flap fragment can be dynamically hybridized with many hairpin probes to form the invasive structure, resulting in gradually enhanced fluorescence signals. The specificity of Invader assay is high enough to identify single-base difference in target DNA especially when the different base is the invaded base in the invasive structure.

Fig. 11.2
figure 2

The principle of the Invader assay

Invader assay was successfully used to detect many single nucleotide polymorphisms (SNPs) [11,12,13], such as factor V G1691A mutation (rs6025) associated with deep venous embolism [14]. In order to achieve quantitative detection, an invasive reaction-based real-time PCR was proposed by employing Invader assay to identify the PCR amplicons and produce fluorescence signals at each annealing step in PCR cycles [15, 16]. Beneficial from the high specificity of Invader assay, the invasive reaction-based real-time PCR is capable to quantitatively detect 0.1% somatic mutations [15] and 0.05% methylated gene fragment [17].

Polymerase chain reaction-restricted fragment length polymorphism (PCR-RFLP) analysis technique combines PCR and RFLP technology to convert RFLP into PCR products-based markers (Fig. 11.3). Firstly, PCR specifically amplifies the DNA fragments containing the mutant base, and then the PCR products are digested by corresponding restriction enzymes. The principle of RFLP is to detect the size of a particular DNA fragment after restriction endonuclease digestion. Point mutations, insertions, and deletions can lead to new generation and removal of restriction sites, or changes in the length of the cleavage products [18], resulting in different patterns of the digested PCR amplicons on electropherogram for the homozygous wild type, the homozygous mutant type, and the heterozygous samples.

Fig. 11.3
figure 3

The principle of PCR-RFLP

Although PCR-RFLP is successfully used to determine genetic diseases-related loci (such as cystic fibrosis) [19] and drug-related SNPs [20], the tedious operation and limited types of restriction endonuclease hinder its applications.

11.2.3 Primer Extension

Many genotyping methods are based on primer extension, such as DNA sequencing [21] and allele-specific extension [22]. The specificity of these methods is mainly dependent on the DNA polymerase, which can specifically incorporate correct dNTPs to the 3′-end of a primer fully complementary to the target DNA. By detecting the type of dNTPs incorporated (DNA mini-sequencing) or designing allele-specific primers (allele-specific PCR), a DNA target can be genotyped.

Most of DNA sequencing methods are based on primer extension (Fig. 11.4). DNA polymerase precisely incorporates a dNTP complementary to target DNA sequence into the 3′-end of a primer. The type of the incorporated dNTP can be identified by measuring the by-product of extension, such as PPi and H+, or detecting the fluorescence signal produced by labeled dNTPs or the extended primer itself. Many technology platforms were used to detect these extension products, for example, DNA chips, DNA sequencers, TOF-MS, and so on.

Fig. 11.4
figure 4

The illustration of primer extension

Besides the high specificity of dNTP incorporation, the recognition specificity of DNA polymerase is also high enough to distinguish one-base mismatch at the 3′-end of the primer. The allele-specific extension method is based on the recognition specificity of DNA polymerase. For SNP detection, two primers, whose 3′-end is respectively complementary to the two base types of the allele, are used. The DNA polymerase only recognizes the matched primer and extends it, thus the allele can be genotyped by measuring extension signals. Allele-specific PCR is the typical technology that uses primer extension to detect DNA mutations.

11.2.4 Allele-Specific Oligonucleotide Ligation

Ligation reaction is another principle widely used in genotyping methods. Ligase catalyzes the formation of phosphodiester bonds between the 5′ phosphate group and the 3′ hydroxyl group of two adjacent oligonucleotides, so the nick of the double-strand DNA can be repaired [23]. The mismatch at the 3′-end or the 5′-end of the adjacent oligonucleotides leads to a very low ligation efficiency. Therefore, we can use this property to design genotyping strategies. There are two typical techniques based on oligonucleotide ligation for gene genotyping: rolling circle amplification (RCA) and multiplex ligation-dependent probe amplification (MLPA).

RCA is an isothermal and efficient enzymatic process driven by unique DNA polymerase (e.g., Phi29 or Bst DNA polymerase), which can generate long single-stranded DNA (ssDNA) with a circularized DNA as template [24]. According to the amplification model, RCA includes three types: (1) linear RCA (LRCA) reaction with single primer [25]; (2) exponential RCA reaction with multiple primers, including hyperbranched RCA (HRCA) [26] and multiprimed RCA [27]; (3) circle-to-circle amplification [28]. As more than 109 copies of amplification product could generate in 90 min, RCA is commonly used for ultrasensitive DNA detection in areas of genomics and diagnostics [29]. For gene genotyping (Fig. 11.5), a long oligonucleotide probe called “padlock probe” is used to hybridize to target DNA, forming a DNA circle with a nick at the end of the probe. The SNP site is located at the 3′-end or the 5′-end of the padlock probe. The matched padlock probe can be ligated by ligase to form an intact circularized DNA probe, which is the template for RCA. The target DNA can be genotyped by detecting the products of RCA with fluorescent-labeled detection probe or fluorescence dye.

Fig. 11.5
figure 5

Schematic illustration of the linear RCA for genotyping

MLPA is another genotyping technology based on ligation reaction. In MLPA (Fig. 11.6), a set of probe pairs is used to hybridize to different SNP sites. Each target SNP is designed with a set of MLPA probes containing two adjacent regions of target-specific hybridization sequences [30]. After denaturation, the target DNA could be hybridized with specific MLPA probes, and the adjacent regions of hybridization sequences could be ligated by a thermostable ligase. Thus the ligation probe could be amplified by PCR with a universal primer pair (F and R), resulting in large amounts of amplification products with a unique length (130–480 bp). Then, the amplification products could be further analyzed by electrophoresis according to the unique length. If no target DNA exists, the MLPA probe could not be ligated, leading to the absence of the corresponding signal in the separation peak map of the capillary electrophoresis. The ligation of the MLPA probe is highly specific to discriminate a single-base difference, achieving accurately genotyping. MLPA enables relative quantification for variations of up to 45 SNPs in a single reaction [31].

Fig. 11.6
figure 6

Schematic illustration of multiplex ligation-dependent probe amplification for genotyping. Each MLPA probe consists of two oligonucleotides. One contains the X sequence complementary to the universal primer R and the hybridization sequence specific to the target, and the other contains the Y sequence complementary to the universal primer F, the hybridization sequence specific to the target, and phage M13-derived S sequence

11.3 Platforms of Genotyping Technologies

11.3.1 Fluorescence Resonance Energy Transfer

Fluorescence resonance energy transfer (FRET) is a mechanism that a donor chromophore may transfer energy to an acceptor chromophore, inducing spectral characteristic change. The efficiency of the energy transfer is dependent on the distance between the donor and acceptor. This property is used to develop research tools in biology and chemistry fields [32]. For genotyping, TaqMan probe and molecular beacon (MB) are commonly used technologies based on FRET.

TaqMan probe is a hydrolysis probe designed to increase the specificity of quantitative PCR. The method was first reported in 1991 by researchers at Cetus Corporation, and the technology was subsequently developed by Roche Molecular Diagnostics for diagnostic assays and by Applied Biosystems for research applications. A typical TaqMan probe is an 18–22-bp oligonucleotide labeled with a reporter fluorophore at the 5′-end and a quencher fluorophore at the 3′-end. As shown in Fig. 11.7, during PCR, the probe anneals specifically to the PCR amplicons and DNA polymerase exhibits its 5′ exonuclease activity to cleave the probe, releasing the reporter molecule away from the close vicinity of the quencher, producing fluorescence signals in each PCR cycle. If the sequences of the probe do not match the PCR amplicons, no cleavage occurs, so that the fluorescence signals remain low intensities. The specificity of TaqMan probe is mainly depending on the specificity of probe hybridization. By carefully designing the probe and optimizing the reaction conditions (especially the annealing temperature), a single-base mismatch can be identified. TaqMan probe enables highly specific and close-tube analysis of PCR amplicons, but the design of the probe and the optimization of reaction conditions are usually tediously. In addition, the specificity of TaqMan probe is not satisfied for somatic mutation detection. Therefore, some modified TaqMan probe, such as Minor Groove Binder (MGB) TaqMan probe [33] and locked nucleic acid (LNA) modified probe [34] are usually used to improve the specificity.

Fig. 11.7
figure 7

The principle of TaqMan probe-based qPCR

MB is a FRET probe containing a single-strand loop and a double-strands stem formed by the hybridization at the 3′-end and 5′-end of the probe (Fig. 11.8). The reporter fluorophore and the quencher fluorophore are at the 3′-end and 5′-end of the probe, respectively, and close to each other due to the stem when no target DNA exists. The sequence of the loop is complementary to a target DNA sequence. When a target DNA hybridizes to the loop, the stem is opened, and the reporter fluorophore and the quencher fluorophore are separated, producing fluorescence signals. A well-designed MB is sensitive to single-base difference in target DNA, especially when the mismatched base is located in the middle of the loop region [35]. Therefore, MB can be used to detect SNP. Because the reporter fluorophore and the quencher fluorophore are located very closely, the background signals from reporter fluorophore are smaller than TaqMan probe [36]. However, it is difficult for MB to detect somatic mutations as well as TaqMan probe due to the limited specificity of DNA hybridization.

Fig. 11.8
figure 8

The principle of molecular beacons

11.3.2 Microarray

Microarray is a DNA sequence variation detection tool that was developed at the end of the last century to meet the needs of large-scale gene function research in the post-genomic era. It has the advantages of high-throughput, simple and convenient operation, easy to achieve automation and low cost, and provides an efficient detection method for high-throughput genotyping.

The principle of microarray is to use an in-situ synthesis or cross-linking method to immobilize tens of thousands of DNA probes onto the surface of the carrier in an orderly manner to produce an array of DNA probes. Then, labeled samples are hybridized to the immobilized probes followed by measuring the intensity of hybridization signals at each probe location to identify the target DNA. There are three strategies for microarray to achieve genotyping.

The first strategy is hybridization-based genotyping microarray [37]. The amplified PCR products containing SNP loci were spotted and immobilized onto amino-modified glass slides to generate a microarray. Then dual-color fluorescence probes specific to different types of the loci are hybridized to the immobilized PCR products (Fig. 11.9). After washing, the fluorescence signals are detected to read out the SNP genotypes. Although this strategy is simple to operate, the specificity of hybridization is strictly dependent on the reaction conditions. Therefore, the false-positive result usually occurs if the reaction conditions are not optimal.

Fig. 11.9
figure 9

Illustration of the hybridization-based genotyping microarray

The second strategy is extension-based microarray [38], in which sequence-specific extension of two immobilized allele-specific primers occurs. According to the fluorescence intensity of each spot, we could read the genotype of a sample (Fig. 11.10).

Fig. 11.10
figure 10

Illustration of the extension-based microarray

The third strategy is ligation-based microarray [39]. As shown in Fig. 11.11, the discriminating probe (DP) and the common probe (CP) are designed to hybridize adjacently on the template DNA and are joined by ligase in the presence of a matching template. The discriminating 3′-base can be A, C, T, or G. The reaction is thermally cycled and ligation products will be addressed on microarray spots by the unique ZipCode sequences flanking each CP. Hybridization control probe carries a different label (6-FAM) than the DP (Cy3).

Fig. 11.11
figure 11

Illustration of the ligation-based microarray

One advantage of using microarrays is that they can be designed to detect a wide diversity of genetic variants. As first demonstrated by Cronin in 1996 [40], oligonucleotide microarray can readily detect many types of variations. Using a modest size oligonucleotide array containing 1480 probes, the investigators were able to detect known deletions, insertions, and base substitutions in the cystic fibrosis transmembrane conductance regulator gene. Another example, Glas [41] used microarrays to predict the risk for breast cancer metastasis in primary breast cancer samples based on the expression pattern of 70 genes. Furthermore, several microarray-based tests that simultaneously examine variations in multiple genes are approved by the FDA and have entered practice.

11.3.3 Mass Spectrometers

Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) was initially developed for proteomics application studies, while the full potential of nucleic acid analysis was demonstrated in the year of 1995 [42]. Compared with the detection of proteins, the detection of nucleic acids by MALDI-TOF suffers from some problems, such as matrix or traces alkali metal ions leading to the formation of adducts, low sensitivity and resolution of analysis, unstable molecular ion peaks of nucleic acid molecules, and so on. Fortunately, with the continuous improvement in sample preparation and purification technology, MALDI-TOF is also widely used in the detection of DNA, especially for SNP detection [43].

MALDI-TOF MS requires biomolecules to form a co-crystal with a suitable matrix. When a pulsed laser (usually 266 nm or 337 nm) is irradiated onto the crystal, the matrix absorbs energy and the co-crystal is volatilized. In the gas phase, a proton transfer reaction occurs between the matrix and the biomolecule, so that the biomolecule is ionized. The ion beam is then accelerated in an electric field of approximately 30 KV, and then passed through an electric field-free drift tube. Although all ions receive the same kinetic energy in the acceleration zone, the speed of each ion differs due to their different mass-to-charge ratios. Therefore, the time of flight (TOF) of each ion that reaches the detector through the drift tube is different. The molecular weight of each ion can be calculated by the TOF.

For SNP detection, the allele-specific product should be firstly generated and then analyzed by MALDI-TOF MS. Four methods are usually used for generating allele-specific products, including primer extension, probe hybridization, restriction enzyme digestion or ligation. The primer extension reaction is the most widely used method to couple with MALDI-TOF MS for SNP detection. In this method, the target DNA is amplified by PCR and then immediately annealed with an extension primer. The 3′-end of the primer is located upstream of the SNP site, and the single-base extension is performed in the presence of four types of ddNTPs. The ddNTP complementary to the SNP site of the target DNA is introduced into the 3′-end of the primer after extension. The extended primers with different molecular weights can be identified by MALDI-TOF MS. In order to improve the resolution and accuracy, additional mass tags can be introduced on the primers or ddNTPs [44].

Primer extension coupling with MALDI-TOF MS is reliable, flexible, and easy to achieve high-throughput multiple SNP analysis. The MassARRAY® molecular weight array system developed by Sequenom (USA) is the only device that directly detects multiple SNPs by mass spectrometry. It can process two gene chips at the same time, and each chip can detect 384 samples × 15 SNP sites.

11.3.4 Sequencing Platforms

Sequencing is one of the most widely used techniques to reveal the function of genes. The development of precision medicine is beneficial from advances in sequencing technology [45]. Sequencing can directly tell us the sequence information of SNP, so it is the gold standard for SNP detection. The main sequencing technology includes Sanger sequencing, pyrosequencing, and next generation sequencing (NGS).

The first working draft of the human genome was completed by using Sanger sequencing technology. In Sanger sequencing process, the primer extension is performed by using four kinds of fluorophores-labeled dideoxynucleotides (ddNTP) mixed with dNTP, thus the extension stops randomly at the different position on the DNA template, producing the extension products with different length. Then, the extension products are separated by polyacrylamide gel electrophoresis, and the sequence information can be read out according to fluorescence signals and lengths of the products. Sanger sequencing is able to sequence up to 1100 bp of target DNA, but it is unable to detect mosaic alleles below 15–20% [46], which limits its application in somatic mutation detection.

Different from Sanger sequencing, pyrosequencing is based on sequencing-by-synthesis. It employs four enzymes to accurately detect nucleic acid sequences during the synthesis. Four kinds of dNTP are added to the reaction system one by one. If the added dNTP is complemented to the target sequence, DNA polymerase can incorporate the dNTP into the 3′ end of the primer, releasing equal molar of pyrophosphate (PPi). The released PPi is subsequently converted to ATP by ATP sulfurylase and then immediately sensed by luciferase, producing a proportional amount of light. Apyrase is in charge of degrading residual dNTP and ATP to ensure no residual effects on the next dNTP adding. The light signals can be converted to electric signals by photosensitive device, thus the sequence can be detected by measuring the electric signals. Because the signal is proportional to the amount of dNTP incorporated, pyrosequencing can achieve quantitative detection of the target. This property makes pyrosequencing able to detect DNA methylation [47], gene expression [48], and miRNA quantitation [49].

NGS is a high-throughput sequencing technology that emerged in 2005. Several NGS platforms had been developed including 454 FLX pyrosequencing platform, Solexa Genome Analyzer platform, ABI SOLiD sequencer, and Ion Torrent system. The first step of all platforms is fragmentation of the sample DNA, followed by ligation to a common adaptor set for clonal amplification. Sequencing mainly employs two categories: sequencing by ligation (SBL) and sequencing by synthesis (SBS). SOLiD is a ligation-based sequencing platform, while Solexa, 454 FLX pyrosequencing platform, and Ion Torrent system are based on SBS. The advent of NGS has revolutionized biology, genetics, pharmacogenomics research, and the diagnosis and treatment of diseases, leading to the development of precision medicine [50]. Although NGS enables the cost of sequencing one person’s genome fewer than 1000 dollars, the preparation of sequencing library and the data analysis are still tedious. Thus, NGS is suitable for high-throughput detecting thousands of SNPs and genome-wide association study (GWAS), not very useful for the detection of a few known SNP sites.

11.4 Conclusion

Genotyping plays an increasingly important role in personalized medicine. At present, more than 100 drug instructions approved by the US FDA indicate that it is necessary to pay attention to the effect of genotype on the drug effects. Therefore, many genotyping technologies have emerged. Although these genotyping techniques are quite different, they are essentially based on four principles: nucleic acid hybridization, extension, enzymatic digestion, and ligation. According to these four principles, different genotyping detection technology platforms have been developed. Each of these technologies has advantages and disadvantages, and the scope of application varies (as shown in Table 11.1). We need to choose the appropriate genotyping technology for different purposes. For example, high-throughput detection techniques are often required when screening drug-related gene polymorphism sites. Thus, gene chip or NGS is a better choice for high-throughput screening of drug-related gene polymorphism sites. If the number of gene polymorphism sites to be detected is small, real-time PCR or pyrosequencing can be easily achieved. Therefore, understanding the principles of these genotyping techniques will help us to choose a right technology for a right detection target. Among these genotyping technologies, sequencing has the highest accuracy, and the detection results of other methods generally need to be verified by sequencing. Therefore, sequencing has always been the gold standard for genotyping. As the cost of NGS continues to decrease, we believe it will eventually become the most important genetic testing tool in the field of personalized medicine.

Table 11.1 Comparison of genotyping technologies