Introduction

Opium poppy (Papaver somniferum L.) produces several pharmaceutically important alkaloids and is an economically important member of the family Papaveraceae (Acharya and Sharma 2009). Benzylisoquinoline alkaloids are extracted from opium poppy and have extensive medicinal properties, including analgesic and narcotic (morphine), antitumor (noscapine), antitussive (codeine) and muscle relaxant (papaverine) effects (Facchini and De Luca 2008; Ziegler et al. 2009; Winzer et al. 2012). In addition, poppy seeds and their oil are edible (Schulz et al. 2004). Although Turkey currently ranks first in the world in terms of surface area of cultivated opium poppy fields with 49,000 hectares (48 % of the world total), the country ranks second in total world morphine production, with 150 tons (18 % of the world total), after Australia (23 %), due to the low morphine content of Turkish poppy (Turkish Soil Product Office 2009). An important contributing factor to the use of low-morphine cultivars is a lack of poppy-specific molecular tools for characterization of poppy germplasm and more efficient breeding (Gumuscu and Arslan 2008).

To date, molecular characterization of opium poppy has mainly involved the use of non-specific markers, such as amplified fragment length polymorphism (AFLP) (Saunders et al. 2001; Dittbrenner et al. 2008), random amplified polymorphic DNA (RAPD) and inter-simple sequence repeat (Acharya and Sharma 2009; Parmaksiz and Ozcan 2011) markers. The most comprehensive analysis of genetic diversity of opium poppy was done by Dittbrenner et al. (2008), who analyzed the genetic diversity of 300 accessions from the opium poppy world collection using AFLP markers. These authors also quantified major alkaloids (morphine, codeine, thebaine, papaverine and noscapine) in the accessions by high performance liquid chromatography. Straka and Nothnagel (2002) constructed the only genetic linkage map currently available for opium poppy using 77 AFLP and 48 RAPD markers. Thus, it is clear that opium poppy-specific markers would be a valuable contribution to diversity and mapping studies.

Simple sequence repeat (SSR) or microsatellite markers are iterations of one to six nucleotide motifs and are dispersed throughout the genome (Varshney et al. 2005; Jones et al. 2009). Although genic SSR markers are gene targeted and more conserved than genomic SSR markers, they are reported to have lower copy number and lower polymorphism information content (PIC) than genomic SSR markers (Li et al. 2002; Ellis and Burke 2007). Opium poppy-specific markers have recently been developed using the publicly available expressed sequence tag (EST) database [National Center for Biotechnology Information (NCBI), Bethesda, MD] to identify and design SSR primers (Lee et al. 2011; Selale et al. 2013). Lee et al. (2011) developed 22 opium poppy-specific genic SSR markers but tested only six of these for molecular characterization of opium poppy. Selale et al. (2013) tested 67 EST–SSR primer pairs on Turkish opium poppy accessions, landraces and cultivars. In these two latter studies, the markers were validated for use in forensic identification of opium poppy (Lee et al. 2011) and genetic diversity analysis (Selale et al. 2013). In both cases, opium poppy-specific genic SSR markers revealed low genetic diversity. Given the low polymorphism and limited number of the poppy-specific genic SSR markers currently available, the aim of our study was to develop opium poppy-specific genomic SSR markers with pyrosequencing technology. Pyrosequencing is a low-cost, high-throughput sequencing method which can expedite SSR discovery (Abdelkrim et al. 2009). As part of this goal, we tested a subset of the SSR markers for polymorphism and transferability in opium poppy and related species. This validation of the markers was important to determine their usefulness for fingerprinting, diversity, mapping and breeding studies in opium poppy and related species.

Materials and methods

Plant materials and DNA isolation

A total of 37 opium poppy accessions from Turkey and seven other Papaver species were used as plant material (Table 1). Eight opium poppy accessions were obtained from the Turkish Soil Product Office (TMO) and 29 accessions were obtained from the Anatolia Agricultural Research Institute (AARI), Eskisehir, Turkey. The related species were used to test transferability of the markers and included Papaver orientale (Iran), P. pseudoorientale (Iran), P. bracteatum (Iran), P. rhoeas (Bulgaria), P. umbonatum (Turkey), P. nudicaule (Mongolia) and P. armeniacum (Armenia). These accessions were obtained from the U.S. Department of Agriculture-Agricultural Research Service Plant Germplasm Inspection Station, Beltsville MD, USA. For genomic DNA isolation, each accession was planted in seedling plates. Plants were grown in the greenhouse (24–25 °C, approximately 33 % humidity). Total genomic DNA was isolated from leaf tissue bulked from ten plants per accession using a CTAB method (Doyle and Doyle 1990).

Table 1 Opium poppy accessions and Papaver species used in the study

DNA sequencing

For sequencing, total genomic DNA of P. somniferum cv. ‘Kemerkaya’ was extracted using the Wizard Magnetic 96 Plant System (Promega Corp., Madison, WI) with the Beckman Coulter Biomek NX Workstation according to the manufacturer’s instructions. Pyrosequencing was done with a Roche 454 GS-FLX sequencer and was performed by 454 Lifesciences Corp. (Branford, CT).

Data acquisition and pre-processing

The adapter and linker sequences were removed to facilitate genome assembly. Following this, the Standard Flowgram Format (SFF) data was converted to separate FASTA (Lipman and Pearson 1985) and quality files because most assembly tools cannot directly work on SFF files (Kumar and Blaxter 2010). The conversion was performed using an open source package of tools written in Python language, which is available at http://bioinf.comav.upv.es/seq_crumbs/download.html. We used the seq_crumbs tool from the package to perform the conversion with the default settings. The resulting FASTA and FASTQ format files were suitable for sequence assembly.

Sequence assembly

MIRA, a whole genome shotgun and EST sequence assembler (Chevreux et al. 2004), was used for sequence assembly because it allowed us to customize the assembly process in great detail and led to the best assembly among more than 100 trials with MIRA and four other assembly tools (Gultekin and Allmer, in preparation). Assembly quality was based on various parameters, such as the weighted median of contig lengths (N50), a commonly used measure. The most successful assembly by MIRA used non-default parameters which are described in Electronic Supplementary Material (ESM) 1. The sequences will be provided in a future publication as an annotated draft genome (Gultekin et al. in preparation).

SSR detection and primer design

The analysis of SSRs of the contig assemblies was performed with our in-house tool SiSeeR (http://bioinformatics.iyte.edu.tr/index.php?n=Softwares.SiSeeRHelp) (Gultekin and Allmer, in preparation). The minimum number of repeats needed to identify perfect SSRs were: ten mononucleotide, six dinucleotide, four trinucleotide, four tetranucleotide, three pentanucleotide and three hexanucleotide repeats. Primer design was performed with the Primer3 (Rozen and Skaletsky 2000) (http://frodo.wi.mit.edu/) console application. A total of 19,046 contig sequences yielding 23,427 SSR sequences were converted from FASTA to the default Primer3 input format Boulder-IO. The Primer3 settings, which differed from the default settings, are described in ESM 1. In order to produce primers flanking the SSR sequences, Primer3′s SEQUENCE_TEMPLATE switch was enabled, and values for the start and end positions of each SSR were generated. SSR and primer design data are available at http://plantmolgen.iyte.edu.tr/research/.

SSR amplification

Amplification of the opium poppy DNA with genomic SSR primers was carried out in 25-μl reaction mixtures containing 1 × PCR buffer, 3 mM MgCl2, 0.125 mM deoxyribonucleotide triphosphates (dNTPs), 1 U Taq Polymerase, 2 pmol forward and reverse primers and 80 ng template DNA. The PCR cycling profile consisted of one cycle of 5 min at 94 °C, followed by 35 cycles of 45 s at 94 °C, 1 min at 55 °C (annealing) and 1 min at 72 °C, with a final extension step of 5 min at 72 °C. To prepare the PCR product for analysis by capillary electrophoresis, 3 μl of the PCR product was added to 27 μl of sample loading buffer (Beckman Coulter, Brea, CA). A size standard (0.5 μl, 600 bp; (Beckman Coulter) was used per reaction in all runs. The mixture for each accession was then run on a Beckman CEQ8800 capillary electrophoresis device using the frag3 method (capillary temperature 50 °C, denaturation 90 °C for 120 s, injection voltage 2.0 kV for 3 s, separation voltage 4.8 kV for 60 min).

Data analysis

Amplified SSR loci were scored as present (1) or absent (0). Rare PCR fragments (<4 occurrences) were excluded from analysis because they might be unreliable. The binary data were used to calculate the PIC value for each marker fragment based on the formula: PICi = 2f i (1 − f i) where f i is the frequency of band presence (Roldan-Ruiz et al. 2000). A dissimilarity matrix generated using the Dice coefficient was used to construct a dendrogram with the unweighted neighbor joining method using the Darwin computer program (Perrier and Jacquemoud-Collet 2006). The correlation of the dissimilarity matrix and the dendrogram was calculated using a Mantel test.

Results

Sequence assembly, SSR identification and primer design

Pyrosequencing resulted in 1,244,412 sequence reads covering more than 695 Mb (Table 2). After removal of the linkers and adapters, nearly 475 Mb remained, with an average sequence read length of 380.4 nucleotides (nt). A total of 649,267 reads representing 52 % of all sequences were assembled into 166,724 contigs encompassing 105 Mb of the opium poppy genome. The weighted median contig length (N50) of the assembly was 913 nt. Only contigs were used for SSR detection because singlet reads were shorter and, therefore, would not be as useful for designing effective primers.

Table 2 Sequence preprocessing and assembly statistics

A total of 23,283 non-redundant SSRs were identified in 18,944 contigs (11.3 % of total contigs). Approximately 16 % (3,135) of contigs contained more than one SSR (ESM 2). A maximum of ten and 11 different SSRs were identified in one contig each. SSR length ranged from 3 to 226 nt, with an average length of 13.4 ± 0.03 nt. The most abundant repeat types were trinucleotide SSRs, which accounted for 49.0 % of all SSRs (Table 3), followed by tetranucleotide repeats (27.9 %). The remaining repeat types each accounted for <10 % of the SSRs. Some motifs were more common than others (Table 4). The majority (82.2 %) of mononucleotide repeats consisted of A/T repeats. AT/TA was the most abundant dinucleotide repeat motif and accounted for 50.4 % of these repeats. AAG/TTC was the most abundant (19.7 %) trinucleotide repeat. AT-rich repeats were also most common repeats in tetra-, penta- and hexanucleotide SSRs. Primers were designed for 23,126 SSRs; only 1.3 % of the SSR sequences were unsuitable for primer design.

Table 3 Simple sequence repeat types in the opium poppy contig sequences
Table 4 Most abundant SSR motif types in contigs

Polymorphism, validation and transferability of genomic SSR markers

A total of 100 genomic SSR primer pairs were first tested on five opium poppy accessions (1290, 1061, 1259, 1065, cv. ‘Kemerkaya’). Of these primers, 96 (96 %) amplified products. A total of 53 genomic SSR markers which showed clear amplification following electrophoresis on agarose gel (ESM 3) were then tested in the 37 opium poppy accessions and seven Papaver species for determination of polymorphism and transferability (Table 1). Seventeen of the opium poppy accessions were named varieties while the others were landraces collected in Turkey. The 53 SSR primers generated 209 polymorphic fragments in all accessions and 90 polymorphic fragments in P. somniferum accessions (Table 5). The average number of amplified fragments per genomic SSR marker was 5 ± 0.01, with a range of 1–13 fragments. A total of 48 SSR primers (95 %) were polymorphic in all accessions, with an average fragment polymorphism of 84 %. For all accessions, average PIC values ranged from 0.05 for psgSSR076 to 0.47 for psgSSR022, with an average PIC of 0.19. Fewer (60.4 %) SSR primers showed intraspecific polymorphism in P. somniferum, with an average fragment polymorphism of 63 %; the average intraspecific PIC value decreased to 0.17. Intraspecific PIC values ranged from 0.05 for five different markers to 0.49 for psgSSR22. In all analyses there was no significant correlation between PIC values and SSR motif types or lengths.

Table 5 Polymorphism information content of genomic markers

To ensure that the expected SSRs were amplified by the primers, we sequenced seven PCR products from cv. ‘Kemerkaya’. Only one of the products contained an unexpected SSR. Although these results indicate imperfect validation of the identified SSRs, the fact that the amplification product contained a polymorphic SSR is sufficient for use of this particular primer pair as a molecular marker.

Transferability of the 53 genomic SSR markers was tested in seven Papaver species: P. bracteatum, P. pseudoorientale, P. orientale, P. nudicaule, P. armeniacum, P. rhoeas and P. umbonatum. A high rate of transferability was observed in these species (ESM 4). All 53 SSR markers yielded PCR products in P. pseudoorientale, 52 (98 %) yielded PCR products in P. bracteatum and P. nudicaule, 51 (96 %) markers yielded PCR products in P. orientale and armeniacum, 49 (92 %) SSR markers yielded PCR products in P. umbonatum, and 47 (88.7 %) SSR markers yielded PCR products in P. rhoeas.

Genetic diversity analysis with genomic SSR markers in opium poppy

Low-frequency fragments (observed in <10 % of poppy accessions) were excluded from all analyses because these low-frequency fragments can be unreliable. A total of 209 high-quality, reproducible polymorphic fragments were used for our diversity analysis of opium poppy and related species. The binary presence/absence data were used to generate a distance matrix using the Dice coefficient to draw a dendrogram employing the unweighted neighbor-joining algorithm. As expected, a Mantel test showed a strong correlation (r = 0.998) between the distance matrix and dendrogram. Dissimilarity between accessions ranged from 0.008 to 0.48 (52 % similarity), with average dissimilarity of 0.14 (Fig. 1). Dissimilarity between opium poppy and related species ranged from 0.23 to 0.48. As expected, Papaver species clustered separately from P. somniferum accessions. The P. somniferum accessions fell into three clusters. Cluster 1 contained 13 landraces and 13 varieties/breeding lines, and the dissimilarity ranged from 0.01 to 0.08, with an average dissimilarity of 0.04. Cluster 1 had six subclusters (subclusters A–F), with cluster 1B containing only named varieties and cluster 1C containing only landraces of P. somniferum; varieties and landraces were intermixed in the other subclusters. Cluster 2 contained eight opium poppy accessions (four landraces and four varieties) and had an average dissimilarity of 0.06, with minimum and maximum dissimilarities of 0.03 and 0.16, respectively. Cluster 3 comprised three opium poppy landraces (59, 22 and 76) which were the most genetically distinct opium poppy accessions in the study. Clustering of the Papaver species did not match that obtained with the internal transcribed spacer nuclear ribosomal DNA and plastid trnL sequences (Carolan et al. 2006).

Fig. 1
figure 1

Unweighted neighbor-joining dendrogram of the poppy accessions constructed by genomic simple sequence repeat markers

Discussion

Simple sequence repeat markers are frequently used in plant genetic studies because they are easy to amplify, highly reproducible, polymorphic and often multiallelic (Varshney et al. 2005). In the past, detection of genomic SSRs and subsequent conversion to markers was expensive and time-consuming, involving the construction and screening of genomic DNA libraries followed by the sequencing of candidate clones (Zalapa et al. 2012). The advent of next-generation sequencing technologies, such as pyrosequencing, has redefined this process. Much of the work is now performed in silico with wet laboratory experiments confined to sequencing and SSR validation. As a result, many more SSR markers can be identified quickly and at a lower cost. This approach is especially promising for minor crop, tree and weed species that have been often ignored in the area of molecular marker development [e.g. cranberry (Zhu et al. 2012), black alder (Lepais and Bacles 2011 and waterhemp (Lee et al. 2009)].

Distribution of SSRs in the opium poppy genome

We used high-throughput pyrosequencing to develop genomic SSR markers in opium poppy. The assembly of the relatively long sequences (average length 731 nt in assembly) resulted in more than 160,000 contigs covering 105 Mb of the opium poppy genome. The contigs provided 2.83 % coverage of the opium poppy genome, which has been reported to contain 3,724 Mb of DNA (Bennett and Smith 1976). As expected, this is much higher than the coverage obtained from EST unigene assembly (0.4 %; Selale et al. 2013) and provided a sevenfold increased coverage of the opium poppy genome. Average density of genomic SSRs was one SSR every 4.5 kb of genomic DNA, which is in the range expected for plant species (Cavagnaro et al. 2010). Genomic SSRs occurred less frequently than non-redundant genic SSRs (every 3.6 kb in EST sequences; Selale et al. 2013) in the opium poppy genome. This difference is in concurrence with the results of Morgante et al. (2002) and of the same magnitude (1.25-fold) as the difference in frequency observed in rice, soybean and sorghum (1.2- to 2-fold, Cavagnaro et al. 2010).

In our study, trinucleotides were the most prevalent SSR type in opium poppy genomic DNA, accounting for nearly half (48.7 %) of all SSRs identified. Trinucleotide repeats are also the most frequently identified SSR type in Arabidopsis, rice, soybean and sorghum genomic DNA (Cavagnaro et al. 2010). The frequency distribution of the other SSR types in opium poppy also matched that of other species, with the frequency being tetranucleotide >dinucleotide >pentanucleotide (Cavagnaro et al. 2010). The frequencies of SSRs in opium poppy genomic DNA were similar to those observed in EST sequences, with the exception of mononucleotide repeats which ranked fifth in genomic DNA and third in genic DNA (Selale et al. 2013). The abundance of trinucleotides in genic DNA is hypothesized to be the result of purifying selection which eliminates any SSRs causing frameshift mutations. However, it is unknown if selection is involved in the distribution of SSR types in genomic DNA.

Among the different motif types, AT-rich motifs were often the most common. This has also been observed in genomic DNA of other dicot plant species (Cavagnaro et al. 2010) as well as in opium poppy genic DNA (Selale et al. 2013). AAG/TTC was the most common trinucleotide motif, which is in agreement with results reported for other dicots, including cucumber, soybean, Arabidopsis and grape (Cavagnaro et al. 2010). AAG/TTC was also the most frequent trinucleotide detected by Selale et al. (2013) in their study of opium poppy genic sequences and was described as the most frequent genic trinucleotide in plants by Li et al. (2004).

The similarities between genomic and genic SSR types and motifs in opium poppy may, in part, be due to the presence of coding sequences in the genomic DNA contig assembly. In future studies, the contig sequences could be annotated to determine the percentage of protein-coding DNA and to determine any redundancy in the SSRs identified in our two studies (i.e. present study and Selale et al. 2013).

Polymorphism and transferability of genomic SSR markers

Most of the SSR primers (96 %) amplified PCR products. The amplification success of the genomic SSR markers was slightly higher than that for the genic SSR markers (82 %) developed by Selale et al. (2013). A high rate of successful amplification can be due to high-quality sequence data and the appropriate primer parameters, such as high GC content. In our study, the genomic SSR markers detected an intermediate level of polymorphism, with an average PIC value of 0.19 among the Papaver species and opium poppy accessions, and a slightly lower level of intraspecific polymorphism (average PIC 0.17). Polymorphism of the genomic SSR markers was lower than that of previously developed genic SSR markers (Lee et al. 2011; Selale et al. 2013). Although genomic SSRs are often reported to have higher levels of polymorphism than genic SSRs (Varshney et al. 2005), Tian et al. (2012) recently showed that genic SSR markers were more polymorphic than genomic SSR markers in Coreoperca whiteheadi.

Many of the genomic SSR markers in our study amplified multiple fragments (average of 5 fragments per marker). This is most likely the result of polyploidy in opium poppy. Papaver somniferum (2n = 22) is an aneuallopolyploid and is hypothesized to have originated from species with x = 7 (Lavania and Srivastava 1999). Therefore, a single copy SSR locus may amplify up to six fragments. In our study, nine SSR markers (17 %) had more than six fragments, indicating that these SSR markers originate from multiple loci. (These markers are labeled with an asterisk in Table 5). In our study, the number of fragments amplified by genomic SSR markers was slightly higher than that amplified by six genic SSR markers developed by Lee et al. (2011). who obtained an average of 2.8 ± 0.5 fragments (mean ± standard error; range 2–5). However, in their study, Lee et al. (2011) did not use markers that produced more than three fragments, thereby limiting fragment number. These authors also reported that the average fragment number for the genomic SSRs was lower than that for the genic SSRs which amplified an average of 8.4 fragments per SSR (Selale et al. 2013). This result can be explained by the fact that many of the genic SSR markers (23 SSR markers) were multiallelic while only nine genomic SSRs were multiallelic in this study. Additional genic and genomic markers should be tested to determine if this difference is real or a sampling artifact.

High transferability of the genomic SSR markers to related Papaver species was observed, varying from 88.7 to 100 % depending on the species tested. A high transferability rate has also been reported for genic SSR markers in opium poppy (Selale et al. 2013). This high rate of transferability indicates that there are conserved regions among Papaver species. High transferability of both types of SSR markers is valuable for Papaver species given the limited molecular tools available for this genus.

Genetic diversity anaysis with genomic SSR markers

The genomic SSR markers developed in this study can be used in opium poppy identification. Although rare alleles were excluded, a total of 32 SSR primers (60.4 %) identified in our study were useful for Turkish opium poppy identification. Retesting of rare alleles is needed to confirm whether or not these are reproducible and suitable for opium poppy identification. Dendrogram analysis also showed that genomic SSR markers were suitable for differentiating opium poppy from Papaver species. Thus, these markers can be used to analyze intra- and interspecific genetic diversity of opium poppy. Landraces and named varieties were intermixed in the dendrogram of genomic SSR markers. This differs from the results reported by Selale et al. (2013) obtained with nearly the same accessions using genic SSR markers in which named varieties clustered separately from landraces. This difference may be the result of artificial selection pressure on genic SSRs. Chabane et al. (2005) reported that genic SSR markers provided clearer separation between wild and cultivated barley than genomic SSR markers, as was observed in our studies. Although the topology of the dendrogram for genomic SSR markers was different from the dendrogram based on genic SSR markers, Mantel test results showed that there was a very high correlation (r = 0.98) between the distance matrices for the two marker types. Therefore, genomic and genic SSR markers give consistent results in opium poppy.

In conclusion, pyrosequencing and mining of the opium poppy genome allowed identification of numerous SSR repeats which can be exploited for marker development. These markers are highly transferable within the genus Papaver and are an important addition to the repertoire of molecular tools available for genetic studies and breeding in opium poppy.