Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preface

In the recent years, single nucleotide polymorphism (SNP) markers have emerged as the marker technology of choice for plant genetics and breeding applications. Besides the efficient technologies available for SNP discovery even in complex genomes, one of the main reasons for this is the availability of high-throughput platforms for multiplexed SNP genotyping. Advancements in these technologies have enabled increased flexibility and throughput, allowing for the generation of adequate SNP marker data at very competitive cost per data point.

Starting with a technical description of the most widely used SNP genotyping platforms, this chapter aims at discussing potentials and limitations for each technology, thereby providing a basis for the selection of the platform of choice for a specific biological application under technical and economical considerations.

Introduction

Precise molecular marker data with high density at the genomic location under investigation is a basic prerequisite for the molecular dissection of complex traits and the diagnostic application of molecular tools in plant breeding and research. Technological advances in methods for high-throughput genotyping of SNP markers have initiated a novel era of using molecular markers in numerous fields such as genetic linkage analysis and trait mapping, diversity analysis, association studies and single marker or genome-wide marker assisted selection (MAS) (Varshney et al. 2009). The potential of SNPs has been impressively demonstrated in human and animal genetics, as well as in model plant species such as Arabidopsis thaliana, rice (Oryza sativa L.) and maize (Zea mays L.), where fully sequenced genomes resulted in the identification of millions of SNPs suitable for genome-wide association studies (GWAS) and molecular breeding concepts such as genomic selection (GS) (Morrell et al. 2012).

Definition and Characteristics of Single Nucleotide Polymorphism (SNP) in Plants

A SNP represents a nucleotide difference between alleles at a specific locus in the genome. As single nucleotides are the smallest unit of inheritance, SNPs represent the most basic and abundant form of genetic sequence variation in genomes occurring at frequencies of up to one SNP per 21 bp in plant genomes (Edward et al. 2008). SNPs can be divided into three different forms; nucleotide transitions (a point mutation that changes a purine nucleotide to another purine, or a pyrimidine nucleotide into another pyrimidine i.e. G⇔A or C⇔T), nucleotide trans- versions (a point mutation that changes a purine nucleotide into a pyrimidine nucleotide or vice versa) or single nucleotide insertions/deletions (indels) (Edwards et al. 2007). Approximately two out of three SNPs are transitions (Collins and Jukes 1994). This chapter will mostly focus on single nucleotide exchanges (transitions and transversions), as indels are discussed in Chap. 10.

Theoretically, a SNP polymorphism can involve the four different nucleotide variants A, T, C and G. In practice, SNPs are generally biallelic and the different variants occur at different frequencies (Schmid et al. 2003). The limited polymorphic information content of SNP markers is compensated by their frequent occurrence in the plant genome, making them valuable for targeting any gene of interest, for high density genotyping as well as for haplotyping (International HapMap Consortium 2007). SNPs, however, are not evenly distributed across the genome and occur at lower frequency in coding when compared to non-coding or intergenic regions (Choi et al. 2007). In coding sequences, SNPs do not necessarily change the amino acid sequence of the produced protein, they are often synonymous and, thus, non-functional. If a nucleotide change is non-synonymous and affects the amino acid sequence, missense polymorphisms resulting in different amino acid compositions and nonsense polymorphisms resulting in a premature stop codon can be distinguished. From an evolutionary point of view, SNPs are stable, not changing significantly from one generation to another and, due to the low mutation rate, excellent markers for studying genome evolution (Syvanen 2001).

SNP genotyping refers to the process of assigning the SNP variant to one of the four nucleotides, thereby discriminating alleles at a particular locus. Recently, a large number of different techniques, chemistries and allele discrimination methods from low- to ultra-high-throughput have been developed and discussed in numerous review articles and books (Syvanen 2001, 2005; Perkel 2008; Ragoussis 2009; Gupta et al. 2008; Bayes and Gut 2011; Chagné et al. 2007).

SNP Genotyping Technologies

New technologies for SNP genotyping are under continuous development and dozens of different platforms are available to date. Here, we focus on the most widely used platforms for plant breeding applications, which we suggest to divide into hybridization-based technologies, enzyme-based technologies and technologies based on physical properties of DNA. The following section will not attempt to provide a fully comprehensive overview of all SNP genotyping technologies available, but will aim to describe the key features of technologies that appear promising for plant breeding applications.

Hybridization-Based Technologies

SNP genotyping technologies based on hybridization of DNA probes complementary to the SNP sites include dynamic allele-specific hybridization (DASH) (Podder et al. 2008), molecular beacons (Mhlanga and Malmberg 2001; Täpp et al. 2000) and microarrays (Nazar and Robb 2011). The challenge with these approaches is to minimize cross-hybridization between allele-specific probes, which can be overcome by optimizing hybridization stringency conditions (Nazar and Robb 2011).

SNP Microarrays

The basic principle of a SNP microarray is the convergence of solid surface DNA capture, DNA hybridization, and fluorescence microscopy (Fig. 9.1a). On high density oligonucleotide SNP arrays, allele-specific oligonucleotide probes (AOP) are immobilized at high density on a small chip, allowing for hundreds of thousands of SNPs to be interrogated simultaneously. Each SNP interacts with different oligonucleotide probes. These probes contain the SNP site at several positions, some of them with mismatches to the SNP variant. For efficient hybridization to immobilized probes, the complexity of the genomic DNA must be reduced through digestion with restriction endonucleases (Kennedy et al. 2003). The comparison of hybridization efficiencies of the SNP to each of these redundant oligonucleotide probes makes it possible to identify specific homozygous and heterozygous alleles (Nazar and Robb 2011). Because AOPs differ only in one nucleotide, the target DNA may hybridize to mismatched probes. The comparatively limited specificity and sensitivity of a microarray approach is compensated by the number of SNPs that can be interrogated simultaneously. High density SNP microarrays have mainly been applied in humans, or in major crop plant species such as rice (www.ricearray.org), where SNP arrays are commercially available (www.affymetrix.com). However, the Affymetrix Axiom™ myDesign™ genotyping arrays allow the development of fully customized SNP genotyping arrays containing from 1,500 up to 2.6 million SNPs. Microarray applications are limited to whole genome SNP interrogation at high density and offer only limited flexibility for targeting only selected genes or genomic regions. Although microarray SNP genotyping data achieve pass rates of >95% and accuracy of >99% (Matsuzaki et al. 2004), the total equipment required is rather expensive (Ragoussis and Elvidge 2006).

Fig. 9.1
figure 1

Array-based SNP genotyping platforms (Paux et al. 2012 ). (a) Affymetrix Axiom. On high density oligonucleotide arrays containing 50 bp nucleotide sequence upstream the SNP, fragmented genomic DNA is hybridized. Differentially labeled probes corresponding to the four different nucleotides with downstream sequence of the SNP are added and ligated. Arrays are then scanned to interrogate each individual SNP. (b) Illumina Infinium. A whole-genome amplification step is followed by hybridization of the amplified DNA to array-bound target sequences that correspond to the 50 bases directly upstream of the SNP. Following hybridization, a single nucleotide extension reaction with hapten-labeled ddNTPs is performed in parallel for all SNPs. Genotype calls are derived from the relative intensity of the fluorescent signals

Enzyme-Based Technologies

The most widely used SNP genotyping technologies are based on a broad range of enzymes including DNA polymerases, ligases, and nucleases. The majority of the technologies presented follow a polymerase or ligase-based primer extension approach. Allele-specific PCR to selectively amplify one of the SNP variant is also known as amplification refractory mutation system (ARMS) (Newton et al. 1989). Two main approaches can be distinguished: Firstly, a single nucleotide primer extension, where a primer hybridizes immediately upstream of the SNP and a DNA polymerase incorporates a fluorescently labeled dideoxynucleotide (ddNTP) that is complementary to the SNP variant, and, secondly, allele-specific primer extension, where a primer perfectly matching the SNP variant with its 3′ end is hybridized and extended by PCR. Generally, DNA polymerase-based extension is very reliable due to the high sequence specificity and fidelity of DNA polymerases. Moreover, it allows a high degree of flexibility and multiplexing as PCR for primer extension can be performed under very similar reaction conditions, making such methods amenable to high-throughput (Syvanen 2001, 2005). SNP detection following primer extension involves a wide range of methods such as DNA sequencing (Ekstrom et al. 2000), MALDI-TOF mass spectrometry and fluorescent analysis (Tang et al. 2004).

SNuPE™ and SNaPshot™ Single Nucleotide Primer Extension

Low-scale but highly flexible systems based on single nucleotide primer extension (also called mini-sequencing) are the MegaBACE™ SNuPe™ system (GE Healthcare, formerly Amersham Biosciences AB) and the ABI PRISM® SNaPshot™ [Applied Biosystems (Life Technologies)], combining DNA polymerase-based primer extension using fluorescently labeled ddNTPs (terminators) with capillary electrophoresis in a single-tube reaction premix. The product size is determined by the length of the initial primer plus one labeled nucleotide. Multiplexing up to ten primers is possible by using primers of different length (Suharyanto and Shiraishi 2011).

Besides the reaction kits supplied by commercial manufacturers, a capillary-based electrophoresis instrument is required. Sequence data need to be analyzed using softwares for allele calling, editing, and verification (Torjek et al. 2003). Assay workflow time is around 8 h, with a call- and accuracy rate of >95% and >99.9%, respectively. Genomic DNA quantity and quality requirements are low (3 ng per PCR reaction).

APEX (Arrayed Primer Extension) Technology

APEX is a mini-sequencing methodology based on oligonucleotide probes arrayed on slides that are used for primer extension (Shumaker et al. 1996; Pastinen et al. 1997; Kurg et al. 2000). The locus-specific PCR amplification is followed by a fragmentation of PCR products based on uracil N-glycosylase. Fragmented PCR products are then denatured and hybridized to complementary oligonucleotides that have been immobilized on a glass array in a reaction mixture. The buffered reaction mixture contains DNA polymerase and four different terminators (ddNTP), each labeled with an individual fluorescent dye. PCR primers are extended under elevated temperature in order to avoid secondary structures of the oligos. After stringent washing, detection is based on imaging using a microarray reader. Imaging is followed by data analysis to convert the fluorescence information into sequence data (Syvanen 2001). APEX can interrogate hundreds to thousands of SNPs in a single multiplexed reaction simultaneously. As the genotype information is obtained by single base extension performed by a specific DNA polymerase, this approach has a higher discrimination power but a lower throughput per run when compared to methods based on allele-specific oligonucleotide hybridization (microarrays) (Pastinen et al. 1997).

iPLEX® Gold MassARRAY SNP Genotyping

MassARRAY SNP genotyping (Sequenom) combines highly specific single nucleotide primer extension using ddNTPs with MALDI-TOF mass spectrometry. First, a locus specific PCR is used to amplify the SNP target region (Fig. 9.2, blue). Following amplification, a third primer (Fig. 9.2, red) anneals upstream with the 3′ end directly flanking the SNP. This primer is then extended by PCR according to the template sequence, resulting in an allele-specific difference in mass between extension products. SNP variants are detected on the actual mass of the extension product determined by MALDI-TOF MS (matrix-assisted laser desorption/ionization time-of-flight mass spectrometry) (Sauer et al. 2000). Up to 40plex reactions in 384 well formats allow a single person to generate 100,000 data points per day. MassARRAY is flexible and suitable to generate both small and large marker numbers per sample (Jones et al. 2007). This method is for medium to high-throughput, and is not intended for whole genome scanning. The main advantage of the MassARRAY system is the low cost per data point at its given flexibility (Bagge and Lübberstedt 2008). However, it requires an expensive and rather complex instrumentation.

Fig. 9.2
figure 2

MassARRAY SNP genotyping (Paux et al. 2012 ). The iPLEX platform combines highly specific single nucleotide primer extension using ddNTPs with MALDI-TOF mass spectrometry

SNPlex™ Genotyping System

Based on oligonucleotide ligation, PCR and capillary electrophoresis, the SNPlex genotyping system [Applied Biosystems (Life Technologies)] allows to detect up to 48 bi-allelic SNP genotypes in a single reaction (Tobler et al. 2005). First, allele- and locus-specific oligonucleotide probes are hybridized to the target sequence. Successful hybridization leads then to ligation of the two probes and simultaneous ligation of two universal linkers. In order to encode the genotype information of each SNP, a unique ZipCode sequence is added to each allele-specific probe. Genomic DNA, unligated probes and linkers are removed using exonuclease digestion, and all 96 ligation products are amplified simultaneously using a single pair of PCR primers. Single-stranded amplicons are produced by binding biotinylated amplicons to the well of a streptavidin-coated microtiter plate and by removing non-biotinylated strands. Universal ZipChute probes containing a sequence complementary to the unique ZipCode sequence in each allele-specific probe, a fluorescent label and a mobility modifier are hybridized to the bound single-stranded amplicons. Finally, amplicons are separated by capillary electrophoresis where SNP genotypes are assigned based on the rate of mobility determined by the mobility modifier of the allele-specific ZipChute probe. SNP detection using the SNPlex system can be completed in 2 days and is amenable to automation, making it a medium to high-throughput system with more than one million data points in a week (Tobler et al. 2005) The protocol is based on standardized hybridization and amplification and does not require optimization for individual SNPs to be genotyped. For designing SNPlex assays, an automated high-throughput pipeline is available which consists of screening the SNP sequence against the target genome, selecting and designing SNP specific ligation probes, assignment of ZipCode sequences and separating assays into compatible multiplex pools (De la Vega et al. 2005). SNPlex has proven to be particularly powerful in investigations, where several hundred of SNPs are genotyped in a hundred or more samples, a situation often encountered in projects aiming at the development of tools for MAS in plants.

Molecular Inversion Probes (MIP) Assay

The MIP assay is a large-scale technology from Affymetrix and uses inverted oligonucleotide probes that contain the sequence information of the SNP and its surrounding sequence, and transfer this information into tags analyzed on DNA microarrays (Fig. 9.3a). MIPs originated from padlock probes (Nilsson et al. 1994) which were modified for SNP genotyping in a way to form gaps at the SNP position when the probe is hybridized to the target region. MIPs are circularizable, single-stranded DNA molecules containing two regions complementary to the DNA sequence flanking the target SNP (Fig. 9.3a, red), universal primer sequences (Fig. 9.3a, blue) that are separated by ribonuclease recognition site (Fig. 9.3a, orange), and a 20 bp sequence tag (Fig. 9.3a, green) (Hardenbol et al. 2003). During the assay, the probes are circularized around the target SNP, complemented with the nucleotides corresponding to the SNPs in four separate allele-specific polymerizations (A, C, G, and T) and ligation reactions. The resulting circular molecule is cleaved between the PCR primers before and after PCR, followed by fluorescent labeling. The labeled molecules are captured on glass microarrays carrying complementary tag sequences for fluorescence detection (Paux et al. 2012).

Fig. 9.3
figure 3

Ligation-based SNP genotyping methods (Paux et al. 2012 ). The most prominent examples are (a) Affymetrix’ molecular inversion probe (MIP) and (b) Illumina’s GoldenGate assay

With this technology, multiplex analysis of 3,000 up to 50,000 SNPs can be achieved in a single tube in parallel. Conversion rates are reported to be >80%, with pass rates of >98% and accuracy >99% (Hardenbol et al. 2005). The hardware required for MIP assays is similar to that used for Affymetrix gene chips, apart from the requirement for a four-color scanner.

Illumina GoldenGate Assays

Illumina’s GoldenGate (GG) genotyping assay is an example of a ligation based primer extension using the BeadArray (Fan et al. 2006), or VeraCode technology. Technically, GG combines oligonucleotide ligation and allele-specific extension PCR. The assay is based on three primers per SNP, one locus-specific (LSO, Fig. 9.3b) and two allele-specific primers (ASO1, ASO2, Fig. 9.3b), directly annealing to genomic DNA. This is followed by an extension reaction at the ASO towards the LSO situated a few nucleotides farther from its 3′end. A ligation reaction links the successfully extended allele-specific product to the LSO, a reaction that gives very high specificity to the assay. As both, ASOs and LSO contain universal primer tails (Fig. 9.3b, blue), the successfully extended and ligated products are amplified by PCR with fluorescently labeled primers. Denatured PCR products are then hybridized to an array of beads (Sentrix Array) carrying sequences complementary to locus-specific tags located in the LSO sequence (Fig. 9.3b, green). For imaging, Illumina’s iScan array scanner is being used for the BeadArray technology. GG genotyping using the BeadArray technology allows levels of multiplexing of 96-, 192-, 384, 768-, 1,536- and 3,072-SNPs that can be assayed on 32 samples in parallel (Fan et al. 2003). Genotyping with the VeraCode technology with its increased flexibility is available with the GG chemistry at 48-, 96-, 144-, 192-, and 384-plex format. The VeraCode technology can be detected by Illumina’s BeadXpress reader system, thereby increasing sample throughput.

Generally, the GG platform is very flexible, protocols can be performed manually or can be easily automated and throughput is high (up to 300,000 genotypes per six hands-on hours). Customized oligo pool assays can be designed for many species, but this is generally laborious. GG genotyping has demonstrated to produce highly reproducible results with a high call rate and accuracy (>99%) (Shen et al. 2005). Despite of the high initial cost, the cost per data point is competitive, especially for highly multiplexed chips.

Illumina’s Infinium iSelect HD Custom Genotyping Beadchips

For genome-wide marker profiling, Illumina’s Infinium assay allows to simultaneously genotype 3,072–1,000,000 SNPs in customized panels. The assay includes first a whole-genome amplification step, followed by enzymatic fragmentation and hybridization to bead arrays of 50-bp-long capture probes (Fig. 9.1b). The assay uses a single bead type and dual color channel approach, i.e. one color for A and T, another for G and C. After hybridization, allelic specificity is conferred by enzymatic base extension. The hapten-labelled nucleotides are recognized by anti-bodies, that are coupled to a detectable signal (Gunderson et al. 2006). The BeadChips can be deployed on the 24-sample format (3,072–90,000 attempted bead types), the 12-sample format (90,001–250,000 attempted bead types), or the 4-sample format (250,001–1,000,000 attempted bead types). The Infinium HD BeadChips offer the ability to interrogate virtually any SNP for any species, however, has only been used so far in sequenced model crop species such as soybean (Glycine max L.) (Haun et al. 2011), maize and loblolly pine (Pinus taeda L.). An Infinium assay for tetraploid and hexaploid wheat is on its way (Paux et al. 2012). The two-color system of the Infinium assay restricts somewhat the classes of SNPs that can be genotyped, but high pass rates and accuracy (>99.9) are performance characteristic (Steemers et al. 2006). One of the advantages of this system is that it allows simultaneous measurement of both signal intensity variations as well as changes in allelic composition (Gupta et al. 2008). DNA requirements are low, ranging from 200 ng for 3,072–250,000 SNPs and 400 ng for 250,001–1,000,000 SNPs.

Invader® Assay

The basic Invader® assay is based on the hybridization of two oligonucleotide probes and subsequent cleavage using thermostable flap endonucleases (FEN) (Olivier 2005). An allele-specific probe together with an Invader oligonucleotide which overlaps the SNP site with a non matching probe form a three-dimensional invader structure that can be recognized by an FEN cleavase. The fluorophore attached to the allele-specific probe is separated from its quencher resulting in a measurable fluorescent signal. This initial assay requires a substantial amount of target DNA and only allows for a single allele to be detected in one assay. Consequently, a biallelic assay was developed based on two subsequent invasive amplification reactions (Olivier 2005). Although the technique is highly reliable and multiplex systems have been developed which allow for more than 20 SNPs to be genotyped simultaneously (Nakahara et al. 2010), possibilities for high-throughput platforms are limited and the assay is rather suitable for specific applications than for general large-scale SNP genotyping.

TaqMan™ Assays

The TaqMan™ assays (Holland et al. 1991) is based on the 5′-nuclease activity of the Taq DNA polymerase and can be analysed using real-time PCR (McGuigan and Ralston 2002). The assay requires forward (FP) and reverse (RP) PCR primers that are used to amplify the region including the SNP (Fig. 9.4a). SNP variants are determined with two fluorescence resonance energy transfer (FRET) oligonucleotides (also called TaqMan probes) that hybridize to the SNP. The probes are fluorescently labeled at their 5′ end and contain a quencher molecule (Q) linked to their 3′ end. During PCR amplification, the allele-specific probe complementary to the SNP allele binds to the target DNA strand and gets degraded by 5′-nuclease activity of the Taq DNA polymerase. Upon cleavage of the probe, the separation of the fluorescent dye from the quencher molecule is generating a detectable signal with measurable intensity (Fig. 9.4a). If the allele-specific probe is not complementary to the target SNP, it will have lower melting temperature and will not perfectly match the SNP site, preventing the nuclease to act on the probe. The use of sterically modified locked nucleic acids for probe design allowed to use shorter probes, thereby optimizing hybridization kinetics and improving detection sensitivity (Kennedy et al. 2006).

Fig. 9.4
figure 4

SNP genotyping based on 5′-nuclease activity of Taq DNA polymerase. This illustration from Paux et al. (2012) compares (a) the TaqMan assay supplied by Applied Biosystems (Life Technologies) with the slightly modified method of KBiosciences KASPar (b)

Since the TaqMan assay is based on a single tube PCR, it is relatively simple to implement, fast and has a high sample throughput (Holloway et al. 1999). The TaqMan assay using well performing probes under optimized reaction conditions can also be multiplexed by combining the detection of up to seven SNPs in one reaction. Thus, TaqMan is an ideal method for genotyping a low to medium number of SNP markers on a high number of samples (Syvanen 2001).

Recently, Applied Biosystems (Life Technologies) combined its TaqMan genotyping assays with the OpenArray® technology which uses nanofilter fluidics for massively parallel analysis of large samples at lower costs per data point. Different formats are available ranging from 16 SNPs on 144 samples to 256 SNPs on 12 samples. One person can run up to 24 plates per day without need of automation and generate more than 70,000 genotyping points (256 SNPs × 12 samples × 24 plates). The use of this technology has been recently applied in maize where a set of 162 gene-based SNPs were converted successfully from Illumina GoldenGate to TaqMan assays (Mammadov et al. 2012).

A similar invention is the recently launched Fluidigm Dynamic Array Integrated Fluidic Circuit (IFC), providing an interesting solution to run the Taqman assay on a high sample throughput (http://www.fluidigm.com/snp-genotyping.html). This micro-fluidic PCR system allows for parallel amplifications in separated nano-volumes and thus the interrogation of multiple SNP on up to 192 samples. Main advantages are the comparably easy workflow (reduced number of pipetting steps) and significantly reduced reagent usage at high data quality, resulting in lower costs and high sample throughput.

KASPar (KBioScience Allele-Specific Polymorphism) Assays

KASPar provided by K-Biosciences (http://www.kbioscience.co.uk/index.html) is a slightly modified method to TaqMan™, still using FRET quencher cassette oligos but a unique form of allele-specific PCR; allele-specific primers complementary to the region upstream of the SNP hybridize in a way that the 3′ end perfectly matches the SNP variant. Each allele-specific primer contains a unique tail sequence at the 5′ end (Fig. 9.4b, green). A common reverse primer complementary of the downstream sequence of the SNP (Fig. 9.4b, blue) and two additional 5′ labeled primers complementary to the allele-specific tail referred to as reporters (Fig. 9.4b, red) that are matching with oligos containing the quenchers, are added to the PCR reaction. In a first step, allele-specific and common reverse primers bind to genomic DNA and generate a product without separating reporter and quencher. Subsequent PCR steps incorporate the labeled reporter complementary to the tail of the allele-specific primer and separate the quencher, thereby generating a fluorescence signal.

As a monoplex, single-step and closed tube system, KASPar is a simple and highly flexible platform on 96, 384 or even 1,536-well plate format. Assay design is easy, but time consuming with increasing number of SNPs. It has a higher SNP conversion rate when compared to TaqMan.

SNP Interrogation Based on Physical Properties of PCR Amplified DNA

The physical characteristics of DNA, i.e. the melting temperature and single strand conformation, are the basis for SNP allele discrimination in this group of SNP genotyping technologies. Methods such as single strand conformation polymorphism (SSCP) (Orita et al. 1989) or temperature gradient gel electrophoresis (TGGE) often require optimized conditions to achieve a high reaction specificity.

High Resolution Melting Curve Analysis (HRM)

High resolution melting curve analysis (HRM) recently emerged as a simple, high-throughput single marker system in plants. HRM measures dissociation of double stranded (ds) DNA of a PCR product amplified in the presence of a saturating fluorescence dye such as LCgreen (Idaho Technology, Salt Lake City, UH) or EvaGreen® (Biotium, Inc. Hayward, CA). The dye integrates itself into the dsDNA of the PCR product. Following PCR, the dissociation of the amplified dsDNA can be monitored with a CCD camera. The shape of the resulting melting curve is used to differentiate the SNP variants.

Two different HRM approaches can be applied for SNP genotyping: the first is a probe-based approach where an unlabeled oligonucleotide probe (also called luna probe) is included in the PCR reaction to interrogate the SNP by post-amplification melting. The probe fully hybridizes with one form of the allele and thus will have a 1 bp mismatch to the alternative SNP variant. During melting, the probe queries the SNP and the genotype is determined by the melting curve shape. The high Tm signal is obtained from the fully hybridized probe, while a lower Tm signal is obtained from the mismatched probe.

The second approach refers to short amplicon genotyping and queries the SNP directly in the PCR product without the need of a probe (Liew et al. 2004). Here, primers are designed to directly flank the SNP in order to minimize the chances of amplifying additional polymorphisms. Therefore the entire amplicon may be as short as 37–44 bp. The melting curves of most homozygotes are sufficiently different, the heterozygotes are even easier to differentiate because they form heteroduplexes, broadening the melt transition and usually give two discernible peaks. HRM is simple, highly effective, and cheap. Multiplexing up to 3 SNPs is possible. Unique to this technology is, that it can also be used for genotyping any other type of DNA sequence polymorphism present in the amplified fragment (Studer et al. 2009).

Considerations for Selecting SNP Technologies for Breeding Applications

The selection of a suitable technology for SNP genotyping mainly depends on the specific biological question, and, consequently, a bottom-up approach in the decision process is suggested. For example, gene-targeting and marker assisted backcrossing may call for a low to moderate number of markers to be screened in hundreds or thousands of samples. Thus, technologies such as HRM, SNuPE/SNaPshot, TaqMan, KASPar, or Invader might be interesting. For genetic diversity studies, the selection of parental lines or trait mapping, a moderate to high number of markers are genotyped on a moderate number of samples, which favors platforms such as iPLEX, SNPlex, APEX or GoldenGate. For GWAS and GS, ultra-high density platforms such as MIP, Infinium, Axiom, or genotyping by sequencing (GBS) might be necessary (Table 9.1, Fig. 9.5). Within these three groups, the technology of choice mainly depends on technical and economical considerations.

Table 9.1 Genotyping scales and potential breeding applications of selected SNP genotyping platforms. Main characteristics of each platform in terms of the providing company, the assay type and the detection technology are summarized
Fig. 9.5
figure 5

Applicability of SNP genotyping platforms as a function of SNP number (x-axis, log scale) and sample number (y-axis, log scale). SNP genotyping platforms were divided into three main groups (white-blue, light-blue and blue) representing single marker, medium density and (ultra)-high density SNP assays, respectively. In order to increase readability, SNPlex and iPLEX were assigned to medium density assays, but can technically also handle single SNPs

Technical Considerations

The relationship between the number of SNPs and the number of samples under investigation constitutes an important technical issue and has been described earlier (Bagge and Lübberstedt 2008). As these two factors may vary substantially from one SNP genotyping project to another, the flexibility of an assay, i.e. the ability to adapt a technology for a specific number of SNPs and samples, determine a technology’s applicability (Fig. 9.5).

The challenge of successful assay design adds another technical dimension. It has been shown that the genomic DNA sequence flanking the target SNP has a high impact on genotyping performance (Grattapaglia et al. 2011). Some alleles may have mutations or additional polymorphisms in primer binding sites leading to null alleles. This can be problematic in allogamous plant species where SNPs generally occur at higher frequencies. Further characterization of the target SNP and the flanking region can be achieved by allele sequencing prior to SNP genotyping assay design. However, allele sequencing might be limited to low and medium SNP density assays.

DNA pooling has been suggested as a practical way to reduce SNP genotyping costs but requires the ability to measure allele frequencies in pools of individuals (quantitative genotyping). Additional technical factors are, among others, (1) achieved call rates and accuracy, (2) the DNA quantity and quality required, and (3) the degree of automation. Main technical characteristics of selected SNP genotyping platforms are summarized in Table 9.2.

Table 9.2 Main technical characteristics of selected SNP genotyping platforms applied in plant breeding

Economic Considerations

Cost effectiveness, both in terms of initial investment and cost per data point is a major factor for the deployment of a SNP genotyping technology. As shown earlier, it includes salary, fixed costs per SNP as well as consumables (Bagge and Lübberstedt 2008). However, for several reasons, it might be rather difficult – if not impossible – to capture all the different economical levels and combine them in a global concept. Firstly, instrument, consumables and labor cost are subjected to major fluctuations in an economically changing environment, they vary between companies, countries, and currencies, as well as over time. Secondly, with increasing flexibility of SNP genotyping technology in terms of SNP and sample numbers, the cost ranges of a particular technology vary extensively for the different multiplex-levels and increasingly overlap with the cost ranges of other platforms, making it meaningless to compare across different genotyping technologies. Thirdly, it makes a difference whether SNP genotyping is outsourced or (partially) conducted in-house: This again depends on the lab equipment and technical assistance available. All this makes it difficult comparing costs per data point. Therefore, a case-to-case decision rather than a global conclusion will be necessary. In this sense, the following points will rather aim to describe major economical tendencies than providing a global guide to select the most economical platform.

For single marker assays, HRM outperforms all other technologies both in terms of cost for investments as well as cost per data point. Main reasons are (1) low variable cost (PCR chemistry including a saturating fluorescent dye), (2) low initial investments (a PCR thermocycler including an optical unit), (3) the ease of assay-design (primer design flanking a SNP, a single PCR reaction followed by melting of the PCR product), and (4) the speed of the assay (2 h including data analysis) leading to cost savings in terms of labor.

Comparing different low to medium multiplex platforms, Sequenom’s MassARRAY platform seems most effective in terms of cost per marker data point (at least for sample numbers close to 384) up to a high SNP number. The SNP number at which a platform with a higher multiplex level such as Illumina GoldenGate can not be generally determined as this depends on the sample number. For other low to medium multiplex platforms (e.g. KASPar, TaqMan, Invader, SNPlex) cost effectiveness depend on the number of markers and the number samples that can be analyzed in parallel.

On the genome-wide scale, GBS will emerge as the technology of choice and further reduction of reagent-consumable and sequencing costs can be expected. In combination with increased throughput of next generation sequencing (NGS) such as Illumina’s HiSeq™ 2000 and Applied Bio-systems (Life Technologies)’s SOLiD™ sequencing system, or even third generation single molecule sequencing (Thompson and Milos 2011), a tremendous potential of cost reduction for GBS can be expected (Davey et al. 2011).

Of major importance in economic considerations is the sample number, particularly for technologies with high fixed costs per SNP such as TaqMan or KASPar. Cost per data point will only become competitive when screening a high sample number. Of further consideration are the investments necessary for in-house SNP genotyping. Initial investment cost vary significantly for platforms requiring an RT-PCR instrument, a capillary electrophoresis instrument, a mass spectrometer or a holistic solution to run a NGS instrument. The lab equipment already available will finally drive the decision.

SNP Genotyping for Plant Breeding

For plant breeders, two main scenarios emerged: On the one hand, large breeding companies working with species with an established (or nearly completed) reference genome make use of a very high number of available SNPs for GWAS and GS (Hamblin et al. 2011). For these purposes, customized Affymetrix Axiom™, Illumina Infinium and Invader arrays, or GBS strategies might be chosen.

On the other hand, small to medium scale breeding companies with a lower budged allocated to molecular breeding approaches might focus on functional markers (Andersen and Lübberstedt 2003) for a few traits only but on a very high scale. In this case, single SNP technologies with an enormous sample throughput are interesting.

Future Prospective of SNP Genotyping in Plants

For single marker approaches, higher sample throughput and further cost reductions can be achieved by reducing the reaction volumes and using higher density plate, chip, or array formats. Nano fluidics such as OpenArray or the Fluidigm Dynamic Array technologies provide interesting solutions for nanoliter scale-PCR on multiple samples. For more complex traits addressed by GWA and improved by GS in the breeding programmes, the detection of haplotypes will be important to understand functional effects of SNPs in cis and meiotic recombination. Thus, the utility of SNP arrays for long range haplotyping in plants will become an important issue in the future.

Looking beyond plant genomics, structural or copy number variation (CNV), defined as genome fragments larger than 1 kb varying in copy number between individuals, has emerged as a significant contributor to human genetic variation in addition to sequence variants (Redon et al. 2006). CNVs are increasingly evaluated for their contribution to phenotypes, prompting a new race to incorporate assays for CNVs within SNP genotyping chips and new analysis algorithms to infer CNVs from SNP genotyping data (Ragoussis 2009). Resequencing strategies will be very promising to assess CNV and genomic regions difficult to address by SNPs (International HapMap Consortium 2007).

Advancements in throughput and multiplexing capacities of NGS technologies offer the opportunity of by-passing the necessity for array-based genotyping by means of sequencing (see Chap. 11). GBS will prove extremely powerful for ultra-high density SNP genotyping applied in GWAS and GS (Hamblin et al. 2011). A major advantage of GBS is the reduced ascertainment bias, i.e. the bias attributable to the fact that different plant material was used for SNP discovery (or SNP ascertainment) and SNP genotyping. Moreover, GBS allows to characterize allele frequencies in genetically heterogeneous plant populations, especially useful for genotyping in crop species where cultivars consist of open-pollinated populations. For small plant genomes such as those of Arabidopsis or rice, for which high-quality reference genome sequences are established, whole-genome resequencing might be the most powerful and straightforward genotyping approach (Huang 2009). For larger and more complex genomes, target enrichment (see Chap. 5) or complexity reduction strategies (see Chap. 11) will allow sequencing only a well distributed portion of the genome. A cost-effective approach of GBS on a small portion of the genome has recently been described and demonstrated in both maize and barley mapping populations (Elshire 2011). However, GBS implies a careful consideration of the experimental setup in order to sequence the right proportion of the genome at sufficient depth to meet SNP number and accuracy for the application in question. The potential of GBS to replace dedicated marker technologies remains to be demonstrated.

Conclusions

Current SNP genotyping technologies offer a wide range of opportunities for using SNP markers as a diagnostic tool in plant breeding. These technologies involve single SNP assays genotyped on thousands of samples, a wide spectrum of multiplexed assays run on several samples in parallel, as well as high-throughput genotyping platforms interrogating millions of markers simultaneously at genome-wide coverage.

A single technology is not applicable to answer all plant breeding related questions and each technology has its own advantages and disadvantages. The selection of a suitable technology for SNP genotyping mainly depends on the biological question that determines SNP and sample number under investigation. Within three main groups (single marker, medium density and high density assays), selection criteria involve technical (flexibility, throughput, success rate, degree of automation) and economic (cost per data point, time) considerations.

SNP genotyping technologies are continuously evolving and the best technology today is likely to become outdated in the near future. For single marker approaches, further sample throughput and cost reductions by minimizing reaction volumes and using nano fluidics can be achieved. For genome-wide SNP genotyping, the ability to detect haplotypes and structural variations is likely to become an important issue. For both, GBS seems promising.