Introduction

Cotton fiber is a well-known and leading textile fiber in the textile industry, with an annual economic impact of $600 billion worldwide [1]. The fiber is produced from cotton plants. These plants are perennial woody shrubs and trees in their wild form, but cultivated cotton is primarily an annual plant [2]. Cotton belongs to the family Malvaceae and the genus Gossypium, a genus of nearly 50 species worldwide, from tropical, subtropical, and semi-arid regions. There are currently four cultivated species of cotton of which two are diploids (2n = 2x = 26) and two are allotetraploids (2n = 4x = 52). Eight genome types (from A to G and K) are found in cotton (Gossypium spp.). Diploid (AA) cultivated species are G. herbaceum and G. arboretum, while allotetraploid (AADD) cultivated species are G. hirsutum and G. barbadense [3,4,5]. The most globally produced species of cotton are allotetraploid, particularly G. hirsutum also known as upland cotton. G. hirsutum accounts for almost 96–97% while G. barbadense accounts for 2–3% of the total worldwide cotton cultivation area. Only 1% of the world’s cotton farmland is used to grow G. arboreum and G. herbaceum [3, 4, 6]. Because of the increased need for food, feed, fiber, and fuel as a result of the increasing world population which is estimated to reach 11 billion in 2050. It is critical to improving agricultural production by at least two to three times. Multiple biotic and abiotic stresses produced due to climate change also caused a decline in global agricultural production. These concerns highlight the urgent need to increase crop productivity. In addition to addressing such limited resources, it is critical to make effective use of existing genetic diversity [7].

Due to the economic importance of cotton in several countries, cotton breeding programs are primarily focused on improving fiber quality yield, as well as early maturity and pest and disease resistance. Breeders are working to increase drought, heat, and salt stress tolerance in cotton varieties to help mitigate the effects of climate change and global warming [6, 8]. The world cotton fiber production rate in the 2019–2020 season was increased slightly (approximately 1%) compared to the previous year and reached 25,929 thousand tons, and it has been confirmed that it is 24,800 thousand tons in 2020–2021 (Dec.). India, which surpasses China in cotton production in recent years, took first place in cotton production by providing approximately 23% of the world’s production in the 2019–2020 season. India is followed by China, the USA, and Brazil [9, 10]. Meanwhile, the cotton fiber production in Turkey in the 2019–2020 season fell to 751 thousand tons with a descending compared to the previous season and after Uzbekistan took place in seventh place[10]. Turkey is present in the list of top 10 cotton-producing countries, which ranks eighth after India, China, the United States of America, Pakistan, Brazil, Australia, and Uzbekistan but still, its consumption demand is higher than domestic production. As a result, Turkey imports 50% of its cotton which makes it one of the major cotton importers after Bangladesh, Vietnam, and China [1, 6]. Cotton is a summer cash crop (sown in March/April and harvested in September/October), that is cultivated in three main regions of Turkey: the Aegean, Mediterranean (mainly Antalya and Cukurova), and Southeastern Anatolia [6, 11]. Cotton faces several challenges and constraints in these regions, resulting in low lint yield and fiber quality. One of them is the extreme heat stress (temperatures above 30 °C) in the Southeastern Anatolia region around mid-July and mid-August. Furthermore, in major irrigated agricultural areas, poor irrigation management has resulted in significant salt accumulation in soils [6]. During the hot and dry summer in the Aegean region, irrigation is a major challenge in cotton cultivation, especially for G. hirsutum L [12]. The above-mentioned circumstances and challenges are urging breeders to develop drought, heat, and salt-tolerant cotton varieties.

The abundance of genetic diversity in germplasm resources is critical to the success of any breeding program. The knowledge of genetic relationships among plant genotypes is essential for getting useful insights into the complexity of existing germplasm, uncovering genotype differences, and developing sustainable germplasm conservation plans. It is particularly difficult to create molecular markers for cotton due to its lower level of intraspecific DNA polymorphism. Thus, cotton’s tetraploid genome is relatively huge, containing 2200–3000 Mb of DNA [8]. Molecular markers that have been used to evaluate cotton diversity include AFLP, ISSR, RAPD, and SSR [3, 13,14,15,16,17]. The latest advances in molecular markers and genome sequencing have created a huge potential for assessing genetic diversity in a vast germplasm collection. For instance, inter-primer binding site (iPBS) markers have recently become the most important and widely used marker system for determining genetic diversity and relationships in numerous plants [18]. It has been successfully used in a wide range of genetic diversity studies [19,20,21,22,23,24,25,26,27]. This PCR-based marker system was developed by [18] based on transposon elements, particularly the long terminal repeat retro-transposons (LTR). One of its benefits is that it does not need knowledge of the genome sequence. Practically, the tRNA complement by its 3’ terminal sequences binds to fully conserved primer binding sites (PBSs) of retrotransposons close to the 5’ LTR. Consequently, the reverse transcription process begins for sequence amplification. Because of its effectiveness in PCR amplification and capacity to screen broad sections of plant genomes, this dominant marker system is ideal for use in genetic diversity analyses for a large number of plant genotypes [21, 29]. This research aimed to explore the genetic diversity of cotton germplasm using the iPBS-retrotransposon marker system, which will contribute to determining population structure and easing the task of cotton breeders, particularly Turkish breeders, in the context of abiotic stresses.

Materials and methods

Plant material

The cotton plant materials used in this study were from the genetic collections of the Nazilli Cotton Research Institute, which included 110 cotton genotypes, and the genetic collections of the University of Dicle - Faculty of Agriculture (Diyarbakir, Turkey), which had 18 cotton pure lines. There were a total of 128 genotypes from various countries (Fig. 1), the majority of which originated in Turkey and the United States. Brief details about the plant material is given in Table 1. The entire cotton germplasm was sown in pots under greenhouse conditions, and young, fresh and undamaged leaves were picked for DNA extraction.

Fig. 1
figure 1

Map showing the collection countries of investigated upland cotton germplasm

Table 1 Passport data of cotton genotypes used in this study

DNA isolation

Young, healthy and fresh leaves were collected at a precise time for the extraction of genomic DNA. CTAB protocol proposed by [30] was used for DNA extraction, with minor modifications by [31]. The concentration of DNA was evaluated with the help of a MaestroNano Pro spectrophotometer (MN-913 A, MaestroGen Inc., Hsinchu City, Taiwan – R.O.C.) and further confirmed by agarose gel electrophoresis (i.e. 0.8% agarose gel). The final DNA concentration for the 128 accession samples to be used in polymerase chain reactions (PCR) was adjusted to 5 ng/µL; the samples were stored at -25 oC till the start of PCR amplifications.

iPBS-retrotransposon assessment

In total, 63 different iPBS-retrotransposon primers,taken from the work of [28], were screened for polymorphism using four randomly picked cotton genotypes. Consequently, for the genetic diversity assessment of all 128 cotton genotypes, only eleven of the most polymorphic primers with perfect, clear and visible banding patterns were chosen. The selected iPBS-retrotransposon primers, as well as their annealing temperatures and sequences, are listed in Table 2. For PCR amplification, Kalendar et al. [28]’ protocol was employed. A volume of 20 µL of PCR reaction mixture was prepared, to which 5 ng of template DNA, as well as a PCR mix, were added. The PCR mix consisted of 1X PCR buffer (Thermo Scientific), 0.6 mM for 18-nt primers or 1 mM primer for 12–13-nt primers, 0.2 mM dNTP for each type of deoxyribonucleotide triphosphate (Thermo Scientific, Waltham, MA, USA), 2 mM MgCl2, and 0.2 U Taq DNA polymerase (Thermo Scientific). For PCR, a pre-denaturation stage at 95 °C for 3 min was performed, followed by 30 cycles, which included a denaturation stage at 95 °C for 15 s, an annealing stage with a temperature of 50–65 °C (primer-dependent) for 1 min, and a terminal extension stage at 72 °C for 5 min. The agarose gel electrophoresis (2%, w/v) and containing 0.5 × Tris-borate-EDTA (TBE) buffer was performed to run PCR amplicons for 155 min. Then, ethidium bromide was carefully utilized to stain the gel for a finer visualization with the Imager Gel Doc XR + system (Bio-Rad, Hercules, CA, USA). A ladder (100 bp+) was used as a molecular weight marker.

Table 2 A list of 11 iPBS-retrotransposon primers was used to evaluate genetic diversity in 128 cotton genotypes

Data analysis

Since only strong, clear and distinct bands were considered for scoring. Based on the dominant character of the iPBS-retrotransposon marker system, a binary scoring system was adopted in this study. As a result, PCR products were given a score of 0 for the absence and a 1 for the presence of clear and net bands. A 100 bp DNA ladder was used to estimate band sizes. POPGENE software (VER 1.32) was used to compute diversity indices including Shannon’s information index (I), the effective number of alleles (Ne), gene diversity (He), and Nei’s genetic distance [32]. The polymorphism information content (PIC) was estimated by applying the formula by [33] “PIC = 2 fi (1-fi)”), with fi representing the frequency of present loci and (1-fi) for the absent loci of a molecular marker. The principal coordinate analysis (PCoA) and analysis of molecular variance (AMOVA) were executed using GeneALEx version 6.5 software [34]. R statistical software (ver.3.4.1) was used to construct an unweighted pair-group arithmetic mean (UPGMA). The Bayesian clustering approach was utilized to learn more about the genetic structure of the entire cotton germplasm by using the STRUCTURE software (VER 2.3.4, Stanford, CA, USA). The most favorable number of clusters (K subpopulations) was calculated using a protocol proposed by [35], in which 10 independent runs were adjusted for each K value. Thus, also for each run, the preliminary burn-in phase was adjusted to 50,000 with 100,000 MCMC (Markov chain Monte Carlo) iterations. Afterward, the logarithm probability relative to the standard deviation (ΔK) was plotted against the number of clusters (K). The last categorization of specific genotypes was about the proportion of the membership coefficient, which was larger than or equal to 75% (Q ≥ 75%).

Results of the genetic diversity and population distribution assessment

All eleven polymorphic iPBS-retrotransposon primers utilized in this investigation yielded a total of 287 bands with perfect bounding features. The average number of bands per primer was observed 26, furthermore the following primers, 2385 and 2384, displayed the maximum and minimum numbers of bands at 35 and 18, respectively. Out of these 287 total strong and visible bands, 286 (99.65%) were found to be polymorphic with an average yield (26) per primer while the range of polymorphism was 96–100% (Table 3). The mean PIC measured value was 0.297, which ranged from 0.340 (2277) to 0.240 (2386), respectively. The same primers, 2277 and 2386, had the highest (1.579) and lowest (1.318) effective number of alleles (Ne) respectively, with an average of 1.481. Primer 2277 and primer 2386 exhibited the highest (0.502) and lowest (0.336) Shannon’s information index values, respectively. Moreover, the mean for Shannon’s information index was 0.443. The maximum and minimum genetic diversity values of 0.336 and 0.208 were recorded for primers 2277 and 2386, respectively. A similar observation was observed in overall gene diversity as primer 2277 found a maximum (0.315), while 2386 showed a minimum (0.155), respectively with a mean of 0.265 (Table 3).

Table 3 Various diversity indices were calculated to investigate the genetic diversity in cotton germplasm using iPBS-retrotransposon primers
Table 4 Analysis of molecular variance (AMOVA) revealing genetic diversity within the studied 128 cotton accessions

The genetic distance (GD) was found among the whole germplasm (128 accessions) of cotton to understand the clear image of genetic variability. The greatest Nei’s genetic distance (0.996) was found between genotypes 42 (GS-363) and 128 (GS-756), while the lowest genetic distance (0.0574) was found between genotypes 125 (GS-753) and 126 (GS-754). The AMOVA revealed 31% variation among populations and 69% within populations in the safflower germplasm (Table 4).The most favorable number of clusters (K subpopulations) was estimated by observing the position of the sharp peak at k = 4 (Fig. 2). The model-based structure divided the entire cotton germplasm into four distinct populations; 35 genotypes (27.34 of the total) in Pop A, 27 genotypes (21.09%), 29 genotypes (22.66%), 24 genotypes (18.75%) in Pop B, C and D, respectively, based on membership coefficients greater than or equal to 75%. Those genotypes that have a low membership coefficient were categorized into an un-classified population (Fig. 3). The Neighbour -Joining clustering split all 128 accessions into 2 main groups A and B. Group A contained twenty-eight accessions, whereas rest of the all 100 accessions were clustered in group B. Group B further categorized into 2 subgroups, B1 and B2, with ninety three and seven accessions respectively (Fig. 4). The PCoA also separated all accessions into distinguished groups (Fig. 5).

Fig. 2
figure 2

Delta K (ΔK) showing the number of populations resulted through the application of iPBS-retrotransposons

Fig. 3
figure 3

Population structure of upland cotton germplasm as determined by the iPBS-retrotransposons marker system

Fig. 4
figure 4

Neighbor-Joining analysis of upland cotton germplasm revealed by iPBS-retrotransposons markers

Fig. 5
figure 5

Principal coordinate analysis (PCoA) for upland cotton germplasm using iPBS-retrotransposons markers

Discussion

Polymorphism level in cotton germplasm based on iPBS-retrotransposons markers

Several researchers have carried out cotton germplasm characterization in various regions of the globe. To the best of our knowledge, none of them have used retrotransposon markers. iPBS-retrotransposon markers exhibited a high percentage of polymorphic bands in comparison with other molecular markers employed in previous genetic diversity studies on cotton, such as AFLP markers [36], RAPD [37,38,39,40], EST-SSRs [41], SSRs [40], ISSRs [39, 40]. In this study, the average polymorphic band per primer is greater compared to the 2.33 recorded by [41] when they utilized microsatellite markers (SSR) to assess numerous Chinese cotton accessions. It was also higher than the 2.13, 3.6, 3.6 and 4.2 average bands/primer obtained by de [42,43,44,45], respectively, where they used SSR markers to investigate the genetic relationship among various cotton cultivars in their different studies. The obtained average polymorphic bands per primer were also greater than that revealed by [41] using EST-SSR markers to assess the genetic relationship of various cotton cultivars from China (5.08 bands/primer). Moreover, the obtained mean bands/primer were also higher as compared to the 4.7 reported by [37], who used RAPD markers to investigate the genetic diversity of numerous genotypes of elite cotton in Egypt. [17] used ISSR markers to analyze genetic diversity among 22 Turkish cotton (G. hirsutum L.) genotypes and found a lower average number of bands/primers of 17.2.

The polymorphic information content (PIC) results obtained in the present investigation was slightly lower than the 0.40 reported by de [42] when they investigated the genetic relationship among numerous Brazilian cotton (G. hirsutum L.) cultivars using SSRs markers and the 0.46 reported by [46] when they assessed the genetic relationship of several upland cotton (G. hirsutum L.) cultivars using mapped SSRs markers. Similarly, it was lower than the 0.62 by [43] when they assessed genetic diversity among several different upland cotton genotypes from different ecological regions of China using SSR markers and the 0.80 reported by [41] when they investigated genetic diversity among several cotton (G. hirsutum L.) cultivars from China’s main cotton-growing regions using EST-SSR markers. In this current study, iPBS retrotransposon markers displayed a mean PIC value of 0.297. This PIC value was slightly superior compared to the 0.18 uncovered by [15] when they analyzed the genetic relationship among glandless upland cotton germplasm using AFLP markers. Thus, PIC values of 0.25 to 0.5 describe loci of moderate polymorphism [47]. Consequently, these suggest that iPBS retrotransposon markers are effective for cotton genetic relationship studies concerning the resultant intermediate PIC value. It was also reported by [48] that markers with superior PIC values are better at detecting polymorphism in a particular population.

Analysis of genetic diversity and population distribution

The presence of genetic variations among this cotton germplasm was demonstrated by estimating various diversity indices (Table 3). The average effective number of alleles (Ne) found in the present study was slightly higher than the mean Ne value (1.4432) revealed by [17] when they investigated genetic relationship among 22 Turkish cotton (G. hirsutum L.) genotypes using ISSRs markers and slightly higher than 1.413 obtained by [40] also employing ISSR markers to study genetic relationship in the Opal cotton (Gossypium hirsutum L.) germplasm. Furthermore, the obtained Ne value was also higher than the 1.354 reported by [46] when they assessed the genetic relationship of 51 upland cotton cultivars with mapped SSR markers. A Ne value of 1.338 was reported by [39] when they studied genetic diversity among Mehr (Gossypium hirsutum L.) cotton progenies and the parent cultivars using ISSR markers. [40] have reported inferior Ne values by employing SSR and RAPD in a combined genetic diversity study on the Opal cotton (Gossypium hirsutum L.) germplasm with Ne values of 1.249 and 1.208, respectively. A smaller Ne value (1.12) was reported by [49] when they determined genetic diversity in 21 cotton genotypes by using EST-SSR markers. By contrast, the obtained Ne value by employing iPBS-retrotransposons markers was beneath the Ne value (1.7391) reported by Liu et al. [38] when they examined the genetic relationship of 21 accessions of (Gossypium hirsutum L.) by employing RAPD markers. The Ne value was also lower than the 1.514 reported by [39] when they investigated genetic diversity among Mehr (Gossypium hirsutum L.) cotton progenies and the parent cultivars using RAPD markers.

Mean Shannon’s information index was found a higher than the 0.4199 revealed by [17] when they assessed genetic diversity among 22 Turkish cotton (G. hirsutum L.) genotypes using ISSR markers. It was also higher than the 0.321 reported by [46] when they studied the genetic diversity of 51 upland cotton cultivars with mapped SSR markers. Moreover, the obtained I value was higher (0.318, 0.250, and 0.163) as reported by [40] in a combined genetic diversity study on the Opal cotton (G. hirsutum L.) germplasm with the use of ISSR, SSR, and RAPD markers, respectively. Moreover, [39] reported inferior I values in a combined study on genetic diversity among Mehr (G. hirsutum L.) cotton progenies and the parent cultivars using RAPD (0.435) and ISSR (0.290) markers. A smaller mean value of Shannon’s information index (0.10) was reported by [49] when they investigated genetic diversity in 21 cotton genotypes by using EST-SSR markers. However, [38] reported a greater mean value (0.5773) of Shannon’s information index when they analyzed the genetic diversity of 21 accessions of G. hirsutum L. by employing RAPD markers.

In this study, the obtained mean value of gene diversity was higher than the 0.2725 reported in the study by [17] when they evaluated genetic diversity among 22 Turkish cotton (G. hirsutum L.) genotypes using ISSR markers. During the assessment of the genetic diversity of opal cotton (G. hirsutum L.), higher H values i.e., 0.223, 0.154, 0.114 were observed by [40] using ISSR, SSR and RAPD respectively.

A smaller mean H value (0.196) was reported by [39] when they analyzed the genetic relationship among Mehr (Gossypium hirsutum L.) cotton progenies and the parent cultivars using ISSR markers. On the other hand, a higher mean value of genetic diversity (0.4017) was reported by [38] when they analyzed the genetic diversity of 21 accessions of G. hirsutum L. by employing RAPD markers. Additionally, when [39] used RAPD markers to investigate genetic diversity among Mehr cotton (Gossypium hirsutum L.) progenies and parent cultivars, they found a slightly higher mean H value (0.294).

The existence of larger values for different diversity indices (Ne, I, H) with few exceptions could be attributed to the greater effectiveness of the iPBS-retrotransposon marker system to assess the genetic relationships, or it might also be due to the size and nature of the cotton genotype collection. The recorded mean genetic distance among 128 cotton genotypes was 0.325238. Two genotypes, G42 (originated from Egypt) and G128 (originated from the USA), exhibited the highest genetic distance (0.996). They can be suggested as potential parents in prospective cotton breeding studies since they are genetically different. The lowest genetic distance was observed between G125 (originated from the USA) and G126 (originated from Turkey) showed suggesting a genetic relationship. Historical facts might provide a plausible explanation here. In the 1930s, top cotton research and breeding institutions from the United States and Turkey exchanged several cotton genotypes [17]. The analysis of molecular variance (AMOVA) was used to determine the pattern of the partition of the total gene diversity among and within populations and to assess genetic differentiation. AMOVA showed that most of genetic variations within populations.

The clustering techniques STRUCTURE, principal coordinate analysis (PCoA), and neighbor-joining analysis have been used to elucidate the genetic diversity and population structure of various crops. STRUCTURE algorithm has shown more robustness in earlier research studies [50,51,52]. Therefore, in this work, the structure is used as a benchmark for cluster analysis. STRUCTURE algorithm divided the entire germplasm into four distinct populations (Pop A to Pop D) (Fig. 2). Each accession was categorized into a precise cluster population according to its individual ancestry coefficient, which was set at Q ≥ 75%. Thus, Population A is made up of admixtures of genotypes from a variety of countries, primarily Turkey (18 entries), followed by genotypes from the United States, which have eight individuals. Furthermore, Australia has four entries; Pakistan has two, while Greece, Bulgaria, and Egypt each have one entry. Population B consisted of genotypes from seven different countries, mostly Turkey and the United States, which have ten and eight entries, respectively, followed by Australia and Bulgaria, which have four and two entries, respectively. Finally, Syria, Spain, and Azerbaijan each got one entry. Population C mainly consisted of cotton genotypes from Turkey (16 genotypes) and 3 genotypes from the USA. Moreover, all cotton pure lines (18 genotypes) from Turkey also are grouped in this population, whereas in population D, there is an admixture of genotypes originated from 6 different countries. The USA and Turkey are leading with nine and eight genotypes, respectively. Greece and Uzbekistan got 2 genotypes each, and lastly, countries like Iran and Tajikistan have one genotype each. Only one genotype in this population has an unknown origin. The above-mentioned results from the population structure also confirmed the recorded Nei’s genetic distance among different cotton genotypes. For instance, two genotypes with the highest genetic distance, G42 (originated from Egypt) and G128 (originated from the USA), were grouped separately in two different populations (Pop A and Pop C, respectively). The contrary was also observed in two populations with the lowest genetic distance, G125 (originated from the USA) and G126 (originated from Turkey), where they were both grouped into one sole population C. This may be explained by historic facts dating back to the first half of the twentieth century when it was reported that more than 40 upland cotton (G. hirsutum L.) genotypes from the USA were introduced to Turkey, mainly in two cotton research centers at that time, including Nazilli Cotton Research Institutes, and they were used in several breeding studies intended to improve cotton in Turkey [17]. The exploration of genetic relationships between the studied 128 cotton accessions using neighbor-joining clustering resulted in a comparable clustering pattern to that of model-based algorithm with few exceptions as pure lines were not clustered to each other and accession having high genetic distance were clustered in same group but far each other. The neighbor-joining clustering grouped the studied germplasm into two population i.e. A and B. A total of 28 and 100 genotypes clustered in population A and B, respectively. Population A was further divided into subpopulation A1 and A2 comprising 7 and 21 genotypes respectively. Population comprised genotypes genotypesonly from Türkiye and USA. Population B was further subdivided into sub-populations B1 and B2 comprising a total of 7 and 93 genotypes respectively. The sub-population B1 comprises pure lines like D-Z-P-2, D-Z-P-15, D-Z-P-17, and four genotypes from Türkiye. The sub-population B2 was divided into B2I and B2II and clustered a total of 93 genotyes. It was observed that genotypes from Türkiye and USA were present in both groups by showing their genetic similarity with the genotypes from other countries. One of the possible reason might be seed mixing during collection and conservation at the USDA gene bank. The discrepancies displayed in neighbor-joining clustering might be described by its reduced resolution power relative to the model-based Structure [53]. Principal coordinate analysis (PCoA) confirmed the clustering based on the structure algorithm of 128 cotton accessions into clearly distinguishable five main populations using 11 iPBS-retrotransposons primers (Fig. 5). The occurrence of some light differences between model-based structure and PCoA can derive from its differing clustering resolution, with more resolution revealed by the model-based structure analysis. These differences in clustering were inline with a previous study by Nadeem [54] in which he used the same marker system as to this study in wheat germplasm and also give preference to strcucture based clustering compared to other algortihms.

Conclusion

The iPBS retrotransposon-based marker system is an effective DNA fingerprinting technique for assessing the genetic variation among various crops. This was the first study to elucidate genetic variation and population structure of cotton germplasm using iPBS retrotransposon-based markers. Genotypes G42 and G128 showed high genetic distance and these genotypes might be recommended for future breeding programs. Results of AMOVA explored higher genetic variations within populations compared to among the populations. The model-based structure algorithm and PCoA separated the germplasm into five distinct populations and the highest mean values of diversity indices showed the robustness, applicability and effectiveness of iPBS markers for the assessment of genetic diversity among various crops. This study will be helpful for cotton breeders and researchers for further breeding research on cotton.