Introduction

Safflower (Carthamus tinctorius L.) is considered an oil seed crop with diverse industrial and pharmaceutical properties (Ambreen et al. 2015; Kumar et al. 2016). It originated in the region of Fertile Crescent about 4000 years ago (Ambreen et al. 2015). There is a tremendous rise in safflower demand in various industries as a lubricant, biofuel, in soap and varnish making, cooking oil, and bird feeds (Golkar 2014, Kumar et al. 2016). It is a natural source of food flavors and cosmetic dyes (Khalid et al. 2017). Safflower seed oil contains a high content of unsaturated fatty acids (linoleic and oleic acid), which contains health-promoting properties for the treatment of trauma, heart attack, and renal thrombosis disorders (Al-Snafi 2015). Safflower leaf is a potential source of more than one hundred different bioactive substances with a wide variety of pharmaceutical and industrial applications (Khalid et al. 2017).

Iran is considered as a rich source for the Carthamus species (Barati and Arzani 2012; Golkar 2014). In a new type of classification, Chapman et al. (2010), determinded the presence of five genetic clusters (1: Europe; 2: Turkey–Iran–Iraq–Afghanistan; 3: Israel–Jordan–Syria; 4: Egypt –Ethiopia; and 5: the Far East–India–Pakistan) in safflower germplasm with EST-SSR molecular markers.

But according to first classification, safflower domesticated into seven distinguished centers of origin: the Far East, India and Pakistan, the Middle East (Iran–Afghanistan, Israel–Jordan–Iraq–Syria, and Turkey), Egypt, Sudan, Ethiopia, and Europe (Knowles 1969).

Genetic variation evaluation by various markers is decisive for the exploitation of genetic resources and its conservation in improvement programs (Poczai et al. 2013; Kiran et al. 2017). Broad genetic resources are vital for supplying valuable alleles based on a changing environment (Poczai et al. 2013; Bahmankar et al. 2017).

Genetic diversity in plant germplasms is evaluated by different markers (such as biochemical, morphological, and molecular markers), but genetic variation at the DNA level is of great importance in the study of genetics (Peng et al. 2008; Bahmankar et al. 2017). In this regard, molecular markers containing information about genealogy, genetic disorders, population structure studies and the evolutionary history of the plant are considered as an effective biotechnological tool to evaluate genetic variation (Sehgal and Raina 2005; Poczai et al. 2013). Higher polymorphism with more precision, independence from the environment and the plant growth stage is considered in molecular markers assays (Poczai et al. 2013).

The genetic diversity of safflower accessions shrinks during domestication and, therefore, flexibility against threatening surroundings, such as environmental stresses, has significantly decreased (Yang et al. 2007; Mayerhofer et al. 2011). Presently, different molecular markers have been used for genetic studies in safflower, such as genetic diversity and the quantitative trait loci mapping of important agronomic and physiological traits, including SSR (Ambreen et al. 2015; Kiran et al. 2017; Bahmankar et al. 2017), EST-SSR (Barati and Arzani 2012; Derakhshan et al. 2014), AFLP (Kumar et al. 2016), ISSR (Golkar et al. 2011; Yaman et al. 2014; Bahmankar et al. 2017), RAPD (Khan et al. 2009), and SRAP (Vilatersana et al. 2005, Peng et al. 2008, Mokhtari et al. 2013). However, the genome base studies of safflower have been restricted because of the inadequate presence of molecular markers in this plant in comparison to other major oil crops such as canola and sunflower.

According to Li and Quiros (2001), the sequence-related amplified polymorphism (SRAP) technique is a highly reproducible DNA marker that can be used for different purposes, including genetic variation (Robarts and Wolfe 2014), quantitative trait loci mapping (Zhang et al. 2006), map construction (Li and Quiros 2001), and cDNA fingerprinting and map-based cloning (Robarts and Wolfe 2014). The advantages of SRAP markers, such as simplicity, revealing of numerous co-dominant markers, and targeting of open reading frames (ORFs) (Uzun et al. 2009; Robarts and Wolfe 2014), have prioritized it over other molecular markers. Evaluation of the genetic diversity between and within the Carthamus species has been done by SRAP (Peng et al. 2008; Mokhtari et al. 2013).

Start codon targeted (SCoT) polymorphism, as a new dominant and/or co-dominant DNA marker, is characterized by the short conserved region flanking the start codon (ATG) (Collard and Mackill 2009). It can generate more information and better marker resolvability than random DNA markers, such as RAPD and ISSR (Mulpuri et al. 2013; Deng et al. 2015; Etminan et al. 2016), without any prior sequence information and by employing longer primers (18-mer) (Xiong et al. 2011). The SCoT markers have been used largely and successfully for the evaluation of phylogenetics (Xiong et al. 2011), genetic diversity (Satya et al. 2015), and fingerprinting (Xiong et al. 2011; Mulpuri et al. 2013).

Different molecular markers, especially random markers, have been used alone or in combination with each other to assess the genetic diversity across different plant families (Mulpuri et al. 2013; Hajibarat et al. 2015). The combination of molecular markers will be useful to fine analyze genetic diversity, phylogenetics, and fingerprinting in different plant species (Etminan et al. 2016). Application of combined molecular markers have been developed in different plant species such as SCoT and ISSR in Cicer (Amirmoradi et al. 2012) and durum wheat (Etminan et al. 2016); SSR and SCoT in chickpea (Hajibarat et al. 2015); ISSR and SRAP in Galega officinalis L. (Wang et al. 2012); and SSR and SRAP in orchard grass (Xie et al. 2010). Knowledge on the similarities and differences among safflower germplasms could prove to be valuable for implementing improvement strategies. A literature review showed that SCoT markers have not been used in the genetic variation of safflower genotypes thus far; this study aims to combine both SRAP and SCoT molecular markers to detect genetic variation among 100 different safflower genotypes from 30 different countries of the world in order to (1) assess the similarity centers and (2) define the patterns of genetic diversity and differentiation within safflower accessions using SRAP and SCoT markers.

Materials and methods

Plant materials

One hundred safflower genotypes including commercial cultivars, breeding genotypes and landraces originated from different countries of the world were used in this study (Table 1). The world distribution of genotypes is presented in Figure S1. The exotic genotypes were obtained from Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Germany and native breeding genotypes were deposited at Agricultural Research Center of Isfahan (IUT), Iran. During the growth period, young leaves were transferred to the laboratory and stored in − 80 °C freezer for further experiments.

Table 1 The collected safflower germplasm used in this study from the different geographical regions of the world

Extraction of genomic DNA

The young leaves were used for DNA extraction according to modified CTAB protocol described by (Murray and Thompson 1980). DNA concentration was measured with a Nano Drop, spectrophotometer (Nano Drop Technologies Inc Wilmington, DE, USA), and DNA was diluted to a working concentration of 50 ng/µl.

Molecular marker analysis

SRAP–PCR amplification

The molecular analysis of SRAP markers was done according to the procedure described by Li and Quiros (2001) with minor modifications. All SRAP primer combinations were initially screened using a group of samples according to Uzun et al. (2009) (Table 2). Twelve primer combinations with scorable polymorphic bands were used for accessions. PCR reactions were performed in a total volume of 15.0 µl as track: 8 µl of Master Mix (Amplicon), 2 µl of DNA template, 2 µl of forward primer and reverse primer (10 μmol/l primers), and 3 µl of ddH2O. The SRAP markers were amplified on following parameters: 5 min at 94 °C, 5 cycles of 94 °C for 1 min, 35 °C for 1 min, 72 °C for 2 min, 30 cycles of 94 °C for 1 min, 50 °C for 1 min, 72 °C for 2 min, and final extension of 5 min at 72 °C. The amplification reaction products were analyzed by electrophoresis on a 2% (w/v) Meta Phore agarose-Lonza gel using 1 × TBE running buffer and detected by staining with ethidium bromide.

Table 2 The molecular marker characteristics used to assess genetic variation in different safflower accessions

SCoT–PCR amplification

Twelve polymorphic SCoT primers (Table 2) were also used to analyze and verify the genetic diversity between safflower genotypes according to Collard and Mackill (2009). PCR reactions were performed in 15 μl volumes containing 8 μl of Master Mix (Amplicon), 20 ng of genomic DNA and 10 μmol/l primers. SCoT-PCR amplification was performed by Gradient thermocycler (Bio Rad, USA) under initial denaturation for 5 min at 95 °C, followed by 35 cycles of 40 s at 95 °C, 40 s at 50 °C and 2 min at 72 °C, followed by final extension of 7 min at 72 °C. The amplification reaction products were analyzed by electrophoresis on a 2% (w/v) MetaPhore Agarose-Lonza gel using 1× TBE running buffer and detected by staining with ethidium bromide.

Data analysis

Amplified SRAP and SCoT amplicons were scored in a binary matrix as present (1) or absent (0) for each sample based on the relative standard size (100 bp DNA Ladder) (Collard and Mackill 2009). Vague bands that could not be easily detected were not scored. POPGENE ver 1.32 (Yeh et al. 1999) software was used to compute different diversity indices.

Analysis of molecular variance (AMOVA) was done to determine genetic differentiation among and within populations using the Arlequin v 3.1 software (Excoffier et al. 1992). Cluster analyses were calculated according to standardized Jaccard’s similarity index (Jaccard 1908) and dendrogram were constructed by NTSYS-pc through Unweighted Pair group Method with Arithmetic mean (UPGMA). Principal coordinate analysis (PCoA) was used to construct 3D eigenvectors using the DCENTER module of the NTSYS 2.0 software, version 2.0 (Rohlf 1992) to add complementary information to the cluster analysis.

Results and discussion

SRAP analysis

In this study, 12 primers from 30 primer combinations showed polymorphism (Table 3). The percentage of polymorphic bands across the primers ranged from 55.5% (Me5–Em6) to 100% (Me4–Em1), with an average of 77.28% polymorphism. The primer bands size ranged from 80 to 1200 base pairs. The ambiguous bands in the upper and the lower position of the gel were removed because of the faintness and inconsistency of amplification. The number of amplicons produced by each primer set ranged from 6 (Me3–Em2 and Me5–Em4) to 14 (Me5–Em1), with an average of 9.41 amplicons per primer set. On the other hand, the number of polymorphic amplicons ranged from 4 (Me5–Em4) to 13 (Me5-Em1) (Table 3). Out of the 113 amplicons, 88 polymorphic fragments (with an average of 7.3 polymorphic bands per primer combination < 10) were identified (Table 3). This finding was similar to the reports of Coffea arabica (Kumar Mishra et al. 2011), Citrus (Uzun et al. 2009), and G. officinalis L. (Wang et al. 2012). The percentage of polymorphic markers produced by each primer pair was within the range of 55.55% (Me5–Em6) to 100% (Me4–Em1), with a mean of 77.28% (Table 3). The average polymorphism information content (PIC), as an index for gained information from each marker values, was calculated as 0.35 (Table 3). The lowest (0.10) and the highest (0.68) values of PIC were attributed to the Me4–Em6 and the Me5–Em6 primer combination, respectively (Table 3). The SRAP markers were located in the medium level of PIC (0.25 < PIC < 0.5) in C. tinctorius (Xie et al. 2010) as well as in other species such as Lolium multiflorum (Huang et al. 2014) and G. officinalis L. (Wang et al. 2012). In addition, different PIC values were reported for SSR (0.30) (Ambreen et al. 2015) and SRAP (0.37– 0.57) markers (Peng et al. 2008; Mokhtari et al. 2013) among different C. tinctorius species.

Table 3 Polymorphism characteristics revealed by SRAP molecular markers in safflower genotypes

SCoT analysis

A total of 36 SCoT primers were used to screen polymorphic markers using safflower accessions. Among the studied primers, a total of 11 primers exhibited distinct and reliable band patterns (Table 4). The number of amplified bands of each primer ranged between 7 (SCoT 31) and 14 (SCoT 30), with an average of 10.36 bands per primer, whereas the number of polymorphic amplicons ranged from 2 (SCoT 19) to 11 (SCoT 30), with an average of 6.18 amplicons per primer. The percentage of polymorphism varied from 37.5% (SCoT 35) to 100% (SCoT 27), with 3–11 polymorphic bands per primer (Table 4). Polymorphism information content was within the range of 0.12 (SCoT 35) to 0.49 (SCoT 3 and SCoT 12), with an average of 0.31 (Table 4). The amplified band sizes ranged from 100 to 2000 bp, most of which ranged from 500 to 1500 bp. Previous reports declared different polymorphism percentages such as 90.24% in orchard grass (Xie et al. 2010), 96.6%, in diospyros (Deng et al. 2015), and 66.67% in peanut (Xiong et al. 2011), which demonstrated high SCoT markers reproducibility in genetic diversity studies.

Table 4 Polymorphism characteristics revealed by SCoT primer pairs in safflower genotypes

Analysis of molecular variance

A combination of all the SRAP and SCoT data was carried out to obtain more accurate genetic estimates. The identification of genetic variance among and within populations was identified by an analysis of molecular variance (AMOVA) (P < 0.05). Genetic inbreeding within subpopulations (FIS) could range from − 1.0 (all individuals are heterozygous) to + 1.0 (no observed heterozygous individuals) (Wright 1984) and the fixation index (FST), which calculates the extent of genetic differentiation among subpopulations, can range from 0.0 (no differentiation) to 1.0 (complete differentiation—subpopulations fixed for different alleles) (Wright 1984). The result represents the existence of heterozygosity (FIS = − 0.65) and differentiation (FST = 0.16) among the C. tinctorius population. It was declared that the most significant variation was attributed to the within-individual contribution (Table 5). Though, safflower have high cross-pollinating percentage of about 30–35%, this significant high variation within-individual is expected.

Table 5 Analysis of molecular variance (AMOVA) in safflower genotypes grouped according to populations

AMOVA analysis was also performed in safflower by other markers such as EST-SSR (Barati and Arzani 2012) and ISSR (Golkar et al. 2011). A high within-population diversity indicates that these safflower populations should be conserved in their natural habitat for preserving genetic diversity (Satya et al. 2015). The result demonstrated medium genetic differentiation. This result reveals the potential of the new marker (SCoT) in basic evolutionary researches, such as map construction and population genetics (Ellis and Burke 2007).

Genetic diversity indices and genetic similarity analysis

Different diversity indices including polymorphism information content (PIC) value, Shannon’s information index (I), number of alleles (Na), effective number of alleles (Ne) and expected heterozygosity (He) are estimated by using appropriate mathematical derivations of population studies (Yeh et al. 1999). The polymorphic information content (PIC) was calculated by the formula [PICi = 2fi (1 − fi)] (Anderson et al. 1993); where fi is the percentage of the ith amplified. The percentage of polymorphism was calculated by dividing the number of polymorphic bands with the total number of regenerated bands. The Shannon index (I) is an information statistic index for measure of gene diversity is calculated based on the formula (I = − ∑ pi ln pi), where pi is the proportion of (n/N), such that n is number of polymorphic loci, N is total number of loci, ln is the natural log, Σ is the sum of the calculations, and s is the number of species (Shannon 1948). The populations with more Shannon index, have higher diversity. Expected hetrozygosity (He) (or Nei’s genetic diversity) estimates as H = − ∑ p 2i which pi = frequency of the ith allele in a locus (Hedrick 1999) and ranged between 0 and 1. It is maximized when there are many alleles at equal frequencies. The number of alleles (Na) counts the number of alleles with non-zero frequency (Hedrick 1999). The number of effective alleles, which estimates the homozygosity, is calculated as Ne = 1/∑p 2i (Nei 1983), in which the number of samples had a large effect on this parameter.

Diversity indices were calculated among five distinct regions (Table 6). The results showed that the highest Shannon index (0.42) and number of observed alleles (Na) (1.87) were observed in Europe. Based on this, it could be concluded that gene flow factors or stable mutations were predominant in the European habitats of the safflower. The highest heterozygosity index (He) (0.29) denoted to groups of Europe and American content. The highest value of Ne (1.52) was observed in the Far East (Table 6). The mean of Na, Ne, He, and the Shannon index were, respectively, 1.66, 1.44, 0.26, and 0.35, which showed relatively high polymorphism in these genotypes (Table 6). The population from Africa exhibited lower genetic polymorphism than other populations. The range of the similarity coefficient among accessions varied from 0.53 to 1, with a mean value of 0.76 (data not shown). Different genetic similarities have been reported previously among C. tinctorius populations (Zhang et al. 2006; Peng et al. 2008; Mokhtari et al. 2013). Different genetic factors, such as variation in climate, genome structure, and environmental effects, could be involved in these differences (Sehgal and Raina 2005; Barati and Arzani 2012; Hajibarat et al. 2015), but identifying a subset of accessions that represents the highest possible genetic distance from the whole collection would enable a better exploitation of genetic resources for breeding as well as removing useful genes (Reeves et al. 2012). Different ranges of genetic similarities have been reported in the safflower germplasm. In terms of genetic similarity analysis, the same conclusion can be drawn as the conclusions of Mokhtari et al. (2013), Kiran et al. (2017), and Bahmankar et al. (2017), indicating that there is high level of genetic diversity between the different genotypes of C. tinctorius. The highest similarity coefficients may be caused by their common origin, possibly as a result of a genetic mutation that occurred in an initial variety (Chapman et al. 2010; Mokhtari et al. 2013). It is not surprising that the similarity between cultivated accessions were close to each other by genetic similarity > 0.80, which is similar to the reports of Peng et al. (2008), which indicated lower values across different Carthamus species (Mokhtari et al. 2013).

Table 6 Summary of genetic variation statistics in different C. tinctorius genotypes with combined two different molecular markers (SRAP and SCoT)

Principal coordinate analysis (PCoA)

Principal coordinate analysis was performed based on the genetic similarity matrix to visualize the relationships among safflower accessions. A two-dimensional plot was also performed in terms of its position relative to two coordinate axes (Fig. 1). The first three eigenvectors explained 49.22% of the total molecular variation, which accounted for 18, 15, and 14% of the observed variation, respectively. PCoA and UPGMA analyses resulted in similar groupings in this study (Figs. 1, 2). Similarities in genotypes grouped in common clusters could be due to the participation of a common parentage, convergent evolution, and subsequent selection (Barati and Arzani 2012).

Fig. 1
figure 1

Principal coordinate analysis (PCoA) of the 100 safflower genotypes based on combined SRAP and SCoT markers (the genotypes number are based on the coding in manuscript)

Fig. 2
figure 2

UPGMA dendrogram of 100 Iranian safflower genotypes based on SRAP genetic similarities; the geographic origin of each genotype is indicated by symbols

Cluster analysis

The cluster analysis dendrogram of the 100 genotype-based, categorized the samples into five clusters (A, B, C, D, and E) based on 166 polymorphic SRAP and SCoT fragments (Fig. 2). The UPGMA results were relative to the centers of similarity. Group A contained accessions that were predominantly from the Middle East (Iran, Iraq, Turkey, and Tajikistan), which were grouped in the same cluster. In this project, Turkish accession is closer to Middle Eastern accessions. Chapman et al. (2010) classified Turkish accessions along with all the accessions from Iran and Afghanistan with microsatellite markers. Cluster B consisted of accessions which were nearly entirely from the Far East (India, Pakistan, and Korea). This finding confirmed the data of Chapman et al. (2010) data, but it was different from the classification of Knowles (1969). These discrepancies could be referred to different used molecular markers and genotypes in different studies.

The majority accessions from Group C were from Europe, except one accession from the USA, two from Paraguay, one from Portugal, one from Morocco, and all the Russian genotypes. It seems that Russian genotypes are closer to Europe genotypes than the Eastern accessions. Group D included accessions from the American content, and finally, Group E included Egyptian, Sudanese, and Libyan accessions. Custer analysis revealed the high discrimination power of the mixture of the molecular markers from distinct geographical regions. Moreover, an unknown accession of safflower (Car200) was found among the Far East genotypes. The majority of American accessions were grouped into one cluster and it seems that these genotypes (from Canada, Mexico, and USA) have the same ancestor. In this project, only one of the accessions from USA and all the genotypes from Paraguay belonged to Cluster C. The results of cluster analysis showed a considerable connection between geographical origin and genetic variation among safflower accessions. However, there were certain inconsistencies between the origin countries of the genotypes placed in different subgroups, which could be due to the plant germplasm substitution across the regions (Vilatersana et al. 2005), high percentage of cross-pollination (Khan et al. 2009), or stable genetic mutations (Poczai et al. 2013). This result is similar to the result of Barati and Arzani (2012) and Golkar et al. (2011), all of whom obtained the same conclusion using different molecular markers. Considering the problems of the current classifications in the Carthamus genus (Vilatersana et al. 2005), the novel SCoT and SRAP markers will also be beneficial in accelerating the taxonomic classification of this genus. However, to confirm the available pattern in safflower genotypes, it is essential to use more number of accessions from each geographical location.

Conclusion

Broadening the genetic base of breeding materials in safflower requires exploiting diverse genetic resources available in the world. Developing newer molecular markers is beneficial for the identification of genetic diversity, interspecies crossbreeding, and producing new cultivars in safflower. The genetic diversity of different safflower species was evaluated using SCoT molecular markers at first. Both SRAP and SCoT markers showed high efficiency for the identification of genetic diversity in safflower accessions, but SRAP markers showed a little more efficiency in diversity identification. This study illustrated a high level of genetic diversity in safflower genotypes. It was found that genetic variance mainly exists within populations. The UPGMA cluster showed that molecular classification was correlated with the geographical distribution of genotypes. High polymorphism primers related to SCoT and SRAP markers were more useful for DNA fingerprinting, population structure analysis, reconstruction of the breeding history of the domesticated genotypes, and the effective management of genetic resources in safflower. These new achievements will help the future selection program to produce new cultivars and interspecies crossbreeding lines. Whether polymorphic bands from SCoT create from the functional region or not; this study requires profound analysis through methods such as sequencing. Conversion of these diagnostic markers into SCARs is suggested in future studies.