Introduction

Barley is an important cereal crop which has been cultivated for thousands of years (Reets and Léon 2004). It ranks fourth in terms of production volume behind maize, rice and wheat and is primarily used for food, feed and the production of alcoholic beverages (FAO 2014). Barley is cultivated in different climates, soils and environments and is exposed to diverse forms of abiotic and biotic stress. Cultivars and varieties with yield, nutritional quality and agronomic performance optimized for different environments are therefore needed to supply the growing global population (UN 2008). Optimized characteristics can be achieved by conventional breeding, but this takes up to 8 years because phenotype-based testing requires fully grown plants (Borlaug 1983; ISAAA 2014). The speed and efficiency of plant breeding can be improved by adopting technologies such as reverse breeding, marker-assisted selection (MAS) and genetic modification, all of which have advantages and disadvantages (He et al. 2014; Jonas and de Koning 2013; Nakaya and Isobe 2012; Varshney et al. 2009). These modern techniques are steadily replacing or augmenting classical breeding approaches (Xu et al. 2012).

Next-generation sequencing (NGS) generates large amounts of data in short time by producing thousands or even millions of reads in parallel. The increasing throughput and falling costs of NGS have encouraged multiple applications in different areas of the life sciences, including medicine (Metzker 2010) and agriculture (Elshire et al. 2011; Mascher et al. 2013; Teixeira et al. 2014; You et al. 2011). For example, the high-throughput sequencing of large numbers of amplicons has been used to genotype the human leukocyte antigen (HLA) locus (Bentley et al. 2009; Holcomb et al. 2011) and to determine zygosity in transgenic maize (Fritsch et al. 2015), but has yet to be applied in the plant breeding sector.

Using a multiplex PCR-based approach (Bentley et al. 2009; Fritsch et al. 2015), we generated amplicons representing naturally occurring polymorphisms in two barley genes, namely the flowering time habit locus VrnH1 (Fu et al. 2005; Szucs et al. 2007; von Zitzewitz et al. 2005) and the grain protein content locus HvNAM1 (Cai et al. 2013; Uauy et al. 2006) (Fig. 1). Each locus represents a relevant trait for breeding programs: VrnH1 in terms of adaptation to environmental conditions and HvNAM1 in terms of grain quality. The coverage of more than 1000 reads per PCR product ensured that the sequencing data were statistically valid and reduced the impact of sequencing errors, especially in the hybrids where anticipated polymorphisms only represented half of the reads. We validated our assay using two different genes containing different kinds of polymorphisms, i.e., indels and single nucleotide polymorphisms (SNPs). We found that our genotyping method is easy to implement, requires no PCR optimization steps and is suitable for the high-throughput analysis of many different samples and/or polymorphisms simultaneously.

Fig. 1
figure 1

a Schematic depiction of the VrnH1 gene. The total size of the VrnH1 gene is 14,776 bp (including the potential insertions). The genotypic features of the winter barley line Strider are shown in bold, whereas normal font shows the genotypic features of the spring barley line Morex. b Schematic depiction of the HvNAM1 gene. The total size of the HvNAM1 gene is 1585 bp. The genotypic features of the low grain protein content barley line Karl are shown in bold, whereas normal font shows the genotypic features of the high grain protein content barley line Clipper. Horizontal bars show stretches of DNA with thick black bars representing exons, thin black bars representing introns and the thick white bars representing untranslated regions. The polymorphisms are indicated by red arrows. (Color figure online)

Materials and methods

Barley seeds, plant cultivation and interbreeding

Barley (Hordeum vulgare) seeds of the winter barley line Strider and the spring barley line Morex were used to prepare genomic templates representing the flowering time locus VrnH1. The Strider VrnH1 locus contains additional sequences that are not present in Morex; thus, the locus is characterized by an indel polymorphism (Fig. 1a). Similarly, the Karl and Clipper lines were used to prepare genomic templates representing the grain protein content locus HvNAM1. Karl is a low grain protein content line with guanidine residues at three single nucleotide polymorphisms (SNPs) in this locus, whereas Clipper is a high grain protein content line with cytidine residues at the SNPs in the first two exons and an adenosine residue at the SNPs in exon three (Fig. 1b). All seeds were provided by the National Small Grains Collection (Aberdeen, Idaho, USA) of the United States Department of Agriculture (USDA). To generate hybrids, the homozygous seeds were planted in Jiffy-7 pellets (Jiffy Products International BV, Moerdijk, Netherlands) and transferred to pots filled with Einheitserde classic substrate (Einheitserdewerke Werkverband e.V., Sinntal-Altengronau, Germany) after ~14 days. As soon as the awns started to grow out of the husks, the mother plant was emasculated by removing immature anthers from the flowers. To interbreed the plants, ripe anthers from the paternal line were transferred to the emasculated flower and placed on the stigma of the maternal plant 2–3 days after the emasculation. The resulting hybrid seeds were harvested after ~3 months, when the seeds and plants were completely dry (Cornelia Marthe and Dr. Jochen Kumlehn, IPK Gatersleben. Germany, personal communication). To extract DNA, the homozygous and heterozygous seeds were planted in Jiffy-7 pellets and leaves from 7- to 14-day-old plants were processed with the Nucleospin Plant II kit (Machery-Nagel, Düren, Germany) according to the manufacturer’s protocol. All DNA samples were dissolved in elution buffer (50 mM Tris–HCl, pH 8.5) to a final concentration of 150–250 ng/µl.

Primers: general aspects

Primers were designed using CLC Main Workbench version 6.9.1 (CLC Bio, Qiagen, Venlo, Netherlands) and synthesized by MWG-Biotech (Ebersberg, Germany). Primers for single PCR and/or for multiplex PCR had a length between 19 and 21 bases and a melting temperature between 58 and 62 °C. The primer sequences are listed in Table S1 (online resource). The reference sequences for primer design were obtained from the NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/nuccore/AY750993.1 for the VrnH1 gene in the Strider line; http://www.ncbi.nlm.nih.gov/nuccore/AY750995.1 for the VrnH1 gene in the Morex line; and http://www.ncbi.nlm.nih.gov/nuccore/EU368851.1 for the HvNAM1 gene). The barcodes and adapters associated with each primer are listed in Tables S2 and S3 (online resources).

Multiplex PCR to generate NGS templates

NGS templates were prepared by PCR on a Veriti 96-well thermocycler (Life Technologies, Carlsbad, USA) using the Expand High Fidelity PCR System (Roche, Rotkreuz, Switzerland), Axygen eight-strip tubes (Thermo Fisher Scientific, Waltham, USA) and eight-lid flat-cap strips (Sarstedt, Nümbrecht, Germany). Each reaction comprised 3.5 U Taq/Tgo DNA polymerase enzyme mix, 500 µM dNTPs, 0.5 µM of each primer (Table S1, online resource) and 150–200 ng of template DNA, topped up to 50 µl with the buffer supplied in the kit. The template was denatured at 95 °C for 2 min and then amplified (30 cycles at 95 °C for 30 s, 55 °C for 30 s and 72 °C for 30 s) followed by a final elongation step for 4 min at 72 °C and indefinite storage at 8 °C. PCR products were sized and quantified by capillary electrophoresis using the Agilent DNA 1000 Kit on an Agilent 2100 Bioanalyzer according to the manufacturer’s instructions (Life Technologies). The PCR products were pre-diluted to at least 100 pM according to the concentration determined by capillary electrophoresis (data not shown). The PCR products were diluted according to the concentration of the least concentrated amplicon because it was a multiplex reaction.

Next-generation sequencing

The pre-diluted PCR products were purified and diluted to the final concentration of 26 pM using the Ion Library Equalizer Kit (Life Technologies). Therefore, 2-µl aliquots from each multiplex PCR vessel were pooled and topped up to 50 µl with elution buffer (50 mM Tris–HCl, pH 8.5). The pool was processed with the Ion Library Equalizer Kit using 90 µl Ampure beads and otherwise following the manufacturer’s protocol. The pooled and equalized PCR products were then sequenced on an Ion Torrent Sequencer (Life Technologies) using an Ion 316 chip (Life Technologies). The results were analyzed using Lasergene Genomics Suite software (DNA Star, Madison, USA). The barcodes shown in Table S2 (online resource) were used to differentiate among the samples.

Results and discussion

We developed our NGS-based polymorphism detection procedure using two previously described quantitative trait loci (QTLs), namely the barley flowering time locus VrnH1 (Fu et al. 2005; Szucs et al. 2007; von Zitzewitz et al. 2005) and the grain protein content locus HvNAM1 (Cai et al. 2013; Uauy et al. 2006) (Fig. 1). Primers were designed to define amplicons of 146–268 bp spanning or flanking the polymorphisms of interest (Fig. 2). The 5.2-kb indel polymorphism at the VrnH1 locus (Fig. 1a) was amplified by competitive PCR with two reverse primers: one specific for the insert and one specific for the downstream 3′ sequence flanking the insert (Fig. 2a). Amplification of the entire insert with the forward and reverse flanking primers was prevented by limiting the PCR elongation time to 30 s, which is not enough to generate a full-size product of >5.4 kb because the polymerization rate of a standard PCR is approximately 1500 bp/min (Roche 2011). A similar approach was used by Fritsch et al. (2015) to verify the presence of transgenic events in maize. The 42- and 17-bp indels at the VrnH1 locus were amplified using primers flanking the indels (Fig. 2b). The same strategy was used for the three SNPs at the HvNAM1 locus. Primers with barcodes (Table S2 of online resource) and adapters (Table S3 of online resource) were used to reduce the number of sample preparation steps by generating short PCR products, covering the indels and SNPs, linked to two adapters and one sample-specific barcode that can be sequenced directly, without prior library preparation. The preparation of a library involves fragmentation of the target DNA followed by end repair, adapter ligation, purification and size selection (Fig. 3) (Life Technologies 2015; Thermo Fisher Scientific 2014) and in our experience is time-consuming and expensive, especially when processing a large number of samples. We furthermore carried out three PCRs to amplify the three polymorphisms at each locus (Fig. 1) simultaneously in a multiplex reaction, rather than individually in three singleplex reactions, to limit the number of pipetting steps and therefore reduce consumables expenditure. Potentially, if more genetic loci would be investigated in a specific plant line, the PCR could even be multiplexed to a higher extend by adding more primer pairs to a reaction vessel. Certainly, the multiplexing capacity is limited to the extent where all target PCR products are still successfully amplified, which would have to be individually tested for each multiplex reaction. Capillary electrophoresis was carried out using an Agilent 2100 Bioanalyzer to confirm the success of the reactions and to quantify the products. This is particularly important because the PCR products need to be diluted to exactly 26 pM for a successful sequencing (Life Technologies 2013). Capillary electrophoresis showed that the PCRs were successful (data not shown) and bands with the anticipated sizes were observed (data not shown). The samples were pre-diluted to 100 pM according to the capillary electrophoresis results. To reduce the number and cost of pipetting steps even further, 2 µl of each pre-diluted sample was pooled and processed collectively. The sample pool was diluted to the required concentration and thereby also purified, using a magnetic bead-based procedure with the Ion Library Equalizer Kit, therefore requiring a single reaction tube and set of reagents per sample.

Fig. 2
figure 2

PCR strategies to amplify polymorphisms of interest. Horizontal bars show the DNA; horizontal arrows represent primers and the direction of elongation. a PCR strategy to amplify the flanking sequence of the 5.2-kb insertion at the VrnH1 locus. The orange and black striped arrows illustrate the use of two distinct reverse primers in a competitive PCR. b PCR strategy to amplify the small indels in the VrnH1 locus and the SNPs in the HvNAM1 locus. (Color figure online)

Fig. 3
figure 3

Schematic depiction and comparison of the workflow with either (a) genomic DNA/long PCR products or (b) short PCR products of 200–400 bp with attached barcodes and adapters. Working steps are shown as required to apply next-generation sequencing to the samples of interest, based on the designated PCR strategy and starting material. The crossed out steps in (b) are omitted when PCR is carried out with barcoded primers amplifying 200- to 400-bp fragments

The reads generated by NGS were aligned to the VrnH1 and HvNAM1 reference sequences, allowing the genotypes to be clearly distinguished (Table S4 of online resource). For the VrnH1 gene, the three anticipated insertions were detected in all reads representing the Strider line, revealing the winter growth genotype of this QTL. The anticipated deletions were detected in all reads representing the Morex line, confirming the spring growth genotype of the QTL in this line. Insertions at these three locations correspond to winter barley alleles, whereas deletions correspond to spring barley alleles (Fu et al. 2005; Szucs et al. 2007; von Zitzewitz et al. 2005). In the Morex × Strider hybrid, the insertions and deletions were distributed in approximately equal shares among the reads (Table S4 of online resource) and the loci could therefore be identified as heterozygous. The genotype of the 5.2-kb indel could be detected by counting the reads aligned either to the 5′ part of the insertion or to the downstream flanking sequence of the anticipated insertion (Fig. 4) because competitive PCR was carried out with three primers (Fig. 2a). The Strider line exclusively generated reads matching the 5′ sequence of the insertion, whereas the Morex line exclusively generated reads matching the downstream flanking sequence. These results show that the Strider line contains the 5.2-kb insertion which is not present in the Morex line. The Morex × Strider hybrid generated reads aligning to both reference sequences, confirming that both indel alleles are present. The 42- and 17-bp indels were detected as gaps in the sequence. The sequencing results for the 17-bp indel are shown as an example in Table 1.

Fig. 4
figure 4

Allelic identification at the 5.2-kb indel site of the VrnH1 gene, showing the number of reads aligned to the 5′ sequence of the insertion, or to the flanking sequence indicating a deletion. a Reads on the Strider template. b Reads on the Morex template. c Reads on the heterozygous Morex × Strider template. Insertion: reads aligned to the 5′-insert sequence. Deletion: reads aligned to the sequence flanking the insertion site

Table 1 Sequencing results for (a) the 17-bp indel in the VrnH1 gene and (b) the SNP at nucleotide position 243 in exon 1 for the HvNAM1 gene. (Color figure online)

The three SNPs of interest in the HvNAM1 gene (Fig. 1b) were also detected by sequencing. The Clipper line contains guanidine residues at nucleotide positions 234 in the first exon and 544 in the second exon, whereas cytidine residues occupy both positions in the Karl line. In the third exon, the Clipper line contains a guanidine residue at nucleotide position 1433, whereas the Karl line contains an adenosine residue at this site. The Clipper alleles correspond to a high grain protein content, whereas the Karl alleles correspond to a low grain protein content phenotype (Cai et al. 2013; Uauy et al. 2006). The Karl × Clipper hybrid showed a near-equal distribution of the two alternative nucleotides in each position (Table S4 of online resource), confirming the heterozygosity of the hybrid at locus. The sequencing results for the SNP at nucleotide position 234 are shown as an example in Table 1. An overview of the sequencing results and read numbers is given in the online resource (Table S4).

The different numbers of reads aligned to the reference sequences of each locus (Fig. 4; Table S4 of online resource) may reflect the uneven amplification efficiency of the multiplex PCR, which can be caused by differences in primer binding efficiency, the favored amplification of a specific target or the formation of primer dimers that inhibit amplification (Le et al. 2009). Furthermore, read errors/low-quality reads are excluded from the final dataset. This often occurs when reads are automatically trimmed or filtered out, e.g., when they are polyclonal or produce an off-scale signal on the Ion Torrent server (Life Technologies 2014). However, there was no need to normalize or equalize the PCRs in our method because a few reads are theoretically sufficient to confirm the presence of a given allele by mapping to a unique reference sequence. In heterozygous samples, those reads should be distributed in a near-equal manner. Although specific limits have not been proposed, higher read numbers are known to reduce the error frequency significantly (Sims et al. 2014), and we therefore propose that a coverage of at least 30 reads per PCR product is desirable.

We have demonstrated that single nucleotide polymorphisms and indel polymorphisms of different sizes can be characterized by NGS in terms of genotype and zygosity. Our assay is therefore useful in the context of barley breeding because diverse polymorphisms in several genes can be investigated simultaneously. The duration and cost of the assay are reduced by the multiplex PCR with barcoded adapters, so that all samples, from different plants and with distinct adapters, can be pooled for parallel dilution, purification and sequencing. So far, 96 different barcodes are available and have been described (Elshire et al. 2011; Life Technologies 2015), enabling the analysis of 96 individual samples per gene of interest in one sequencing run. Therefore, when a large sequencing chip such as the Ion 318 (Life Technologies) with a capacity of 80 million reads (Life Technologies 2015) is used, and the read number is optimized to 30 parallel reads per PCR product, it would be possible to screen more than 27,000 individual PCR products representing polymorphisms in specific loci using 96 different samples or plant lines. These numbers have been theoretically calculated by considering that the capacity of reads divided by the number of barcodes and desired read depth gives the number potentially screenable PCR products (80 million ÷ 96 ÷ 30 = 27,777.78). This large capacity makes the assay suitable for the high-throughput analysis of genotypes. It should also be possible to increase the number of barcodes as soon as they are defined in the sequencing software, so that even more samples could be processed in a single run. In general, our method can be carried out on any sequencing device that allows amplicon sequencing. Therefore, the number of screenable PCR products depends on the capacity of the used sequencer and its properties. As demonstrated by Campbell et al. (2015), who established an NGS assay with 100-bp reads to genotype rainbow trout Oncorhynchus mykiss for SNP markers, a different PCR strategy, with two thermal cycling steps, could also be used to increase the number of samples.

Many parallel reads of the polymorphisms in each of the finely mapped genes of interest provide robust data about the genotype and therefore allow the phenotype to be predicted. Plants can be selected as soon as they have grown enough to extract DNA, usually 1–2 weeks after planting. It has yet to be determined whether this assay is suitable for polyploid crops such as potato and wheat, and whether it can be implemented in plant species with large genomes and/or large numbers of transposable elements (Choulet et al. 2010; Schnable et al. 2009). To design ideal primers enabling the PCR amplification of desired polymorphic regions, preliminary investigations of relevant traits and associated gene loci as well as polymorphisms affecting the phenotype are necessary. PCRs should be designed based on these reference sequences of choice, so that PCR products can be generated in most of the cases, but at least in individuals carrying the desired loci variations. Further polymorphisms especially in the primer binding regions occurring in more diverse barley lines will not be addressed for the selection process. Thus, a failed PCR product will just lead to no sequencing result and such plant lines will be sorted out from the selection process. To reduce the number of failed PCRs in such cases, it would be possible to design several primers in close proximity to the original primer binding site to increase the probability of a successful primer annealing. Also, the use of wobble primers would be possible if certain mutations within the primer binding sites are known. However, the straightforward implementation of our assay and its consistent results address the limitations of other modern plant breeding techniques such as MAS (Jonas and de Koning 2013; Nakaya and Isobe 2012). Sample preparation is simple, and library preparation steps such as fragmentation, sizing and adapter ligation can be omitted, thus providing an advantage over other genotyping-by-sequencing approaches (Deschamps et al. 2012; Elshire et al. 2011; He et al. 2014). Although the assay relies on prior knowledge concerning the sequence of polymorphic sites and the distribution of alleles linked to certain phenotypes, as long as association studies and the fine mapping of traits continue to be used in plant breeding, our assay could nevertheless be adopted widely as a new tool for high-throughput selection in many species of crops and other plants.