Introduction

Sesame (Sesamum indicum L.) is one of the most ancient cultivated plants in the world renowned for its high content and quality of oil (Bedigian 2003; Yu et al. 2019; Zhang et al. 2013b). Sesame, a diploid species (2n = 26) with an estimated genome size of 369 Mb (Zhang et al. 2013b), is one of the most seasoned crops widely produced in Asia (54.9%), Africa (40.8%), and South and Central America (4.3%) (Weldemichael et al. 2020). Sesame seeds are abundant in oil (50–55%), proteins (18–20%), carbohydrate (13.4–25.0%), and digestible fiber (9.8%) (Makinde and Akinoso 2013). Besides, sesame seeds are rich in various vitamins and minerals, such as pantothenic acid (0.60%), calcium (1.35%), and potassium (0.67%), as well as phosphorous, iron, zinc, magnesium, and vitamin B1 (Yuan et al. 2018). Sesame lignans are nutritious sources owing to high contents of unsaturated fatty acids and natural antioxidants like sesamolin and sesamin as well as small amounts of sesaminol, piperitol, sesamolinnol, pinoresinol, episesaminone, hydroxymatairesinol, and larisiresinol (Shittu et al. 2008). Sesame is also used as a component for the manufacture of soaps, perfumes, cosmetics, insecticides, and pharmaceutical products (Zhang et al. 2019). Moreover, it promotes health benefits providing relief from hypertension, oxidative stress, and neurodegenerative diseases (Nakano et al. 2002). Currently, the crop is cultivated in 146 countries of the world. Several poor and middle-income farmers depend on this important crop for their livelihoods.

Traditional crop improvement approaches have been used in the emerging oilseed crop improvement for several years to produce new genotypes with desired traits, but are very slow and laborious. This can, hence, no longer sustain the global oilseed production demands of the rapidly increasing population and declining agricultural resources. The production and quality of oilseeds need to be improved at an accelerated rate in order to achieve the current and future demands. Hence, the demand for vegetable oil is growing globally, where by 2050, it is projected to reach 240 million tons (Barcelos et al. 2015). In 2018, the largest harvested area of sesame (3,480,000 ha) was in Sudan with total production per year of 981,000 tons, while the lowest cultivated area (210,000 ha) was in Uganda with a total yield of 140,000 tons (Table1; Fig 1).

Table 1 List of top nine sesame producing countries: total area, production, and yield in 2018
Fig. 1
figure 1

List of top nine sesame producing countries

Despite its numerous applications in economic, medicinal, and nutritional importance, little attention was given in the area of genomics for sesame genetic improvement (Dossa et al. 2017a). In recent years, improving sesame yield and resistance levels of sesame to various biotic and abiotic stresses remains the primary objectives in sesame breeding. So far, development of new genotypes with better tolerance to abiotic stresses was given a priority by sesame researchers (Dossa et al. 2016a). Several research groups were engaged in developing genomic resources that assist breeding for high yielding sesame varieties during the last couples of years. As a result, different molecular techniques including Random Amplified Polymorphic DNA (RAPD) (Bhat et al. 1999), Amplified Fragment Length Polymorphism (AFLP) (Laurentin and Karlovsky 2006), Simple Sequence Repeat (SSR), such as Inter-Simple Sequence Repeat (ISSR) (Kim et al. 2002), Expressed Sequence Tags-SSR (EST-SSR) (Badri et al. 2014; Li-Bin et al. 2008; Sehr et al. 2016; Wei et al. 2011), Complementary DNA-SSR (cDNA-SSR) (López‐Villavicencio et al. 2005; Spandana et al. 2012; Surapaneni et al. 2014), Genome Sequence-SSR (Dossa 2016b; Yu et al. 2017), Chloroplast SSR (cpSSR) (Sehr et al. 2016), Restriction site-Associated DNA sequencing (RAD-seq) (Wang et al. 2016; Wu et al. 2014a), Specific Length Amplified Fragment Sequencing (SLAF-seq) (Zhang et al. 2013d), RNA-Seq (Wei et al. 2014a), Whole-Genome Sequencing (WGS) (Wang et al. 2015; Wei et al. 2015; Zhang et al. 2016), and Genotyping by sequencing (GBS) (Uncu et al. 2016), were developed for sesame improvement. But still, further applications of advanced genomics tools are required for the biotic and abiotic stresses as well as inherent genetic problems, such as seed shattering, indeterminate flowering, and other desirable morphological traits. This review, thus, explores the various findings of molecular markers for genetic diversity studies, construction of genetic linkage maps, identification QTLs/genes, and discovery of molecular markers using freely available online databases and finally forwards the possibilities of manipulating advanced molecular tools for future sesame improvement programs.

Genetic resources and core collection of sesame

Taxonomically, sesamum is related to Utricularia gibba as revealed by high-throughput genome sequencing and is estimated to be diverged from U. gibba approximately 98 million years ago. On the bases of chromosome number, 38 species of sesame have been classified into three groups, few of which are indicted below (Table 2). The five proposed centers for sesame genetic diversity include India, China, Central Asia, the Middle East, and Ethiopia (Zeven and Zhukovsky 1975). Great effort has been devoted to collect from and preserve sesame accessions in India, China, and Korea (Zhang et al. 2012a). The United States Department of Agriculture, Agricultural Research Service, Plant Genetic Resources Conservation Unit (USDA, ARS, PGRCU), contains about 1226 accessions from Africa, Asia, Europe, North America, and South America (Morris 2009). About 6658 (Bisht et al. 1998) and 7698 (Park et al. 2015) sesame accessions were conserved in India and South Korea, respectively.

Table 2 List of some important sesame species with their chromosome number variation

Moreover, several countries including China, Ethiopia, India, Israel, Kenya, Nigeria, South Korea, and the United States host gene banks for both wild and cultivated sesame (Table 2) via core collections, of which 453 genotypes are in China (Xiurong et al. 2000), 362 in India (Bisht et al. 1998), 287 in South Korea (Kang et al. 2006), and 870 in Ethiopia (Woldesenbet et al. 2015a). This is a huge germplasm, which is crucial for effective exploitation and utilization of genetic variation to facilitate future sesame improvement. However, utilization of this rich genetic resource for sesame improvement is unfortunately inadequate and the current diversity in the germplasm remains unknown. In this regard, the Asian sesame accessions have been well studied (Dossa et al. 2016b), but the sesame accessions from Africa, America, and Europe have been less preserved and characterized (Basak et al. 2019).

Concerning sesame core collection, it has been established by assessing the genetic diversities collected worldwide and fairly represent pre-selected core collection. These collections have been done to efficiently explore and utilize novel genetic variations (Hodgkin et al. 1995) and resulted in 362 Indian accessions (Bisht et al. 1998), 453 Chinese accessions (Zhang et al. 2012b), and 278 Korean accessions (Park et al. 2015). These core collections serve as valuable genetic resources for present and future sesame genetic improvement. However, sesame accessions from Africa have not been well characterized and conserved as compared to the Asian germplasm (Dossa et al. 2016b). Hence, extra efforts are required to collect indigenous accessions and wild relatives of sesame from Africa and constitute an extensive gene bank for their efficient conservation and exploitation in future improvement programs.

Molecular markers

Molecular markers are essential to explore highly diverse genotypes and relationships among populations so as to speed up breeding programs and identify target genes in several crops. They are used for increasing the genetic gain, minimizing the breeding cycles, and useful for molecular breeding (Varshney et al. 2014). Up to now, a number of different molecular markers have been used for genetic diversity study, genetic linkage map construction, molecular characterization, functional online database development, and identification of genomic regions (QTLs) for different traits associated with various biotic and abiotic factors. This review, hence, presents the detail description of these markers as indicated below:

Random amplified polymorphic DNA (RAPD) marker

RAPD marker has been employed for genetic diversity studies of different sesame accessions across many countries and resulted a significant level of polymorphism. So far, a study conducted on 58 accessions of sesame from 18 different states of India and four adjoining countries using 24 RAPD markers grouped 48 of the total accessions into six clusters, the remaining 10 accessions being highly diverse (Bhat et al. 1999). In another study, 35 out of 38 accessions of sesame from four different areas of Turkey were clustered into six groups using seven RAPD markers (Ercan et al. 2004). Genetic diversity of two commercial cultivars and seven lines collected from Venezuela were grouped into two major clusters using 12 RAPD markers (Salazar et al. 2006). Similarly, a set of 10 sesame germplasm obtained from diverse regions of Sudan were clustered into two main groups suing 10 RAPD markers (Abdellatef et al. 2008). In a comparative study of genetic diversity, 20 sesame genotypes from divers’ geographical zones of India were clustered based on their geographical origin using 30 RAPD markers (Kumar and Sharma 2009). Similarly, 16 Indian genotypes of sesame collected from various geographical locations were grouped into two major clusters using 26 RAPD markers (Sharma et al. 2009). These findings revealed a significant discrimination power of the marker for the characterization of Indian sesame genotypes.

In another study, accessions of sesame from Cambodia and Vietnam were assessed and grouped into four clusters using 10 RAPD markers (Pham et al. 2009, 2011). In this study, a considerable amount of genetic diversity was reported in both the Vietnamese and Cambodian populations with a relatively high genetic diversity in populations from South Vietnam. Besides, two major clusters and six sub-groups were obtained from 20 Pakistani sesame accessions using 10 RAPD markers (Akbar et al. 2011). The authors reported some geographically distant accessions clustered together suggesting to be a result of large movement of Pakistani farmers to various regions moving sesame seeds for farming into their new geographical locations. Genetic diversity of 27 sesame accessions (13 Iranian genotypes from different locations and 14 exotic genotypes) with extensive geographical coverage was analyzed using 15 RADP markers and clustered into five groups (Tabatabaei et al. 2011). Compared with other studies conducted with RAPD markers, this finding reported higher variability than that of Indian, Southeast Asian, and Sudanese sesame genotypes (Abdellatef et al. 2008; Kumar and Sharma 2009; Pham et al. 2009). Besides, 47 accessions of sesame representing diverse agro-ecological zones of India were assessed and grouped into two major clusters using 22 RAPD markers (Dar et al. 2017). In their study, brown- and white-seeded varieties were reported to be clustered together, while black-seeded varieties were outside the cluster. Another study comprising nine sesame cultivars from Tamil Nadu (India) where clustered into two major groups using 10 RAPD markers (Anandan et al. 2018). In this study, the diversity among the available cultivars was suggested to be governed by origin of the cultivar and geographical condition of growing regions.

Various studies are done on RAPD-based techniques and significant levels of variability for accession of sesame from various countries and climatic conditions have been achieved, even with the use of limited set of primers. RAPD technique is, thus, a robust as well as time- and cost-effective technique valuable for germplasm conservation, cultivar identification, and determination of genetic variation and effective selection of parents in sesame-breeding programs.

Amplified fragment length polymorphism (AFLP)

AFLP, a PCR-based technique, is multi-locus fingerprinting approach that selectively amplifies a subset of restricted DNA fragments to create and compare unique fingerprints for desire genomes. Thus, the marker has been employed for the analysis of genetic diversity in several sesame accessions collected from different countries and agro-ecological zones (Table 3). Genetic relationship and diversity study of 32 accessions of sesame collected from the Venezuelan germplasm collection using 457 AFLP markers revealed a large genetic diversity within the germplasm collection, 25 of which were clustered in two major groups but without any association between genotypes and geographical origin (Laurentin and Karlovsky 2006). Moreover, 20 commercial cultivars of sesame collected from Venezuela were analyzed and grouped into four clusters using 339 AFLP markers (Laurentin and Karlovsky 2007). Based on previously studied AFLPs, genome diversity of 10 sesame accessions representing the majority of genome diversity in India, Sudan, Venezuela, and western Asia was analyzed using AFLP markers (Laurentin et al. 2008). In their finding, genetic analysis based on neutral markers was reported to be inaccurate predictor of the potential parental lines in sesame-breeding program. Similarly, 96 accessions of sesame from various parts of the glob were divided into two major clusters (eastern Asian origin and South Asian origin) using AFLP techniques (Ali et al. 2007). The similar pattern of AFLP in their finding reflected geographical origin and morphological characteristics of the accessions. In addition, 40 accession of S. radiatum (Table 2) gathered from diverse regions of Benin were analyzed using four AFLP markers and grouped into three clusters (Adéoti et al. 2011). Their findings indicated that the accessions from both northern and southern regions of the country clustered together indicated independent clustering of the accessions from their geographical origin, which was similar to the finding of (Laurentin and Karlovsky 2006) but in contradiction to the finding of (Adéoti et al. 2011).

Table 3 Summary of molecular marker used for population structure and genetic diversity analysis in sesame

Simple sequence repeat (SSR)

Simple sequence repeats also known as microsatellites are highly versatile, operationally flexible, and low-cost markers applied in an extensive range of important and appropriate field of sesame improvement (Table 3), such as genetic variability, gene/QTL mapping, markers-assisted breeding (MAB), genotype identification, and evolution study (Dossa et al. 2016b; Shi et al. 2014; Surapaneni et al. 2014). A study isolated 50 SSRs from improved library of sesame, 10 of which were polymorphic and used to determine genetic diversity of 16 sesame accessions (Dixit et al. 2005). In another study, 20 genotypes of sesame, commercially cultivated in diverse geographical regions of India, were grouped into four major clusters using 10 SSR markers with the average similarity coefficient of 0.67 as a cutoff value, providing a basic tool for variety characterization and establishment of genetic association among genotypes of sesame (Kumar and Sharma 2011). Similarly, genetic diversity, population structure, and phylogenetic relationship among 150 sesame accessions collected from 22 countries were studied using 16 SSR markers and grouped into three clusters (Cho et al. 2011). Cluster analyses of this study reported a multifaceted distribution pattern of accession with mean genetic similarity coefficient of 0.52 by population and 0.45 by accession. Molecular diversity among 60 accessions of sesame germplasm originating from diverse sources (8 states of India) was studied using five polymorphic SSR markers and revealed high level of polymorphism with only some primer pairs, while the majority of them were monomorphic in nature (Spandana et al. 2012). In a different study, 545 sesame accessions were systematically analyzed and clustered into three groups with 42 SSR primer pairs (Yue et al. 2012). Their study showed that the genetic bases of Chinese varieties were relatively narrow while those of introduced varieties were abundant, suggesting the importance of introducing foreign sesame genotypes.

Genetic diversity study of 130 Chinese accessions of sesame including four wild germplasm accessions, 44 landraces, and 82 cultivars using 88 polymorphic SSR markers showed the existence of five subgroups belonging to two main clusters (Wu et al. 2014b). This finding suggested that sesame landraces have broader genetic bases than improved cultivars, indicating the consequence of genetic drift or selection during breeding process. Using selective hybridization strategy and 95 mining ESTs of the NCBI database, 25 microsatellite markers were developed as new set of microsatellite markers, which were analyzed and screened for genetic diversity in a collection of 16 accessions from sesame germplasm (Badri et al. 2014). This finding showed five clusters with a varied number of accessions: cluster I (two accessions), cluster II (five accessions), cluster III (five accessions), cluster IV (two accessions), and cluster V (two accessions). Similarly, 31 accessions of sesame obtained from 12 countries were screened and grouped into two clusters using 218 polymorphic SSRs (Wei et al. 2014b). The first group included 16 accessions (five from China, eight from East and South Asia, and three from Africa), while the second group included 15 accessions (12 of which were from China and the other three accessions all from the United States). In this study, most samples from China and other countries were clustered into different groups, indicating Chinese sesame accessions were distantly related to those of other countries. Genetic diversity of 96 sesame accessions, collected from 22 countries scattered over six different geographic regions in Africa and Asia, was studied using 33 polymorphic SSR markers and revealed more diversity from the Asian accessions compared to the African accessions (Dossa et al. 2016b). Molecular variance analysis of this study showed more than 44% of genetic variation due to diversity among the geographic regions. In addition, another study constructed a comprehensive physical map from 151 published genomic sequences along with more than 83,135 non-redundant SSRs supplied with their physical positions and characteristics (Dossa 2016). Besides to its provision of the genomic distribution of 151 genomic sequences including functional genes, QTLs, and markers linked to some important sesame traits, this map sheds light on certain functional genomic regions that need future in-depth research to effectively locate the functional genes. This would assist future sesame research in fine mapping of more functional genes based on the available functional markers and QTLs.

Genetic diversity of 47 different sesame accessions assessed with 18 SSR markers (Dar et al. 2017) and nine sesame cultivars assessed with 10 SSR markers (Anandan et al. 2018) was grouped into two and three major clusters, respectively. The genetic diversity study among 129 sesame cultivars and landraces from Korea and various parts of the glob were studied using 23 informative SSR markers and revealed two major and six minor clusters (Asekova et al. 2018). In this study, clustering of genotypes was independent to their geographic origin, showing substantial gene flow among the genotypes across the selected geographic regions. Moreover, a high degree of variation was reported among individuals within (85.84%) than among the populations (14.16%). Similarly, a high variance was reported among the individuals within (90.45%) than between the countries of origins.

Recently, 38 SSR marker-based dendrogram analysis clustered 25 sesame landraces collected from five districts of northern Ghana into two main clusters: cluster A sub-divided into four sub-clusters and cluster B sub-divided into three sub-clusters (Adu-Gyamfi et al. 2019). Other genetic diversity studies reported five clusters from 30 diverse genotypes from different parts of West Bengal and 18 elite Indian sesame varieties analyzed using 21 polymorphic and 30 sesame-specific SSR markers, respectively (Bhattacharjee et al. 2019, 2020). Several authors reported grouping of genotypes belonging to different geographical origin in the same cluster, showing independent clustering of genotypes to their geographical origin (Asekova et al. 2018; Bhattacharjee et al. 2019, 2020; Kim et al. 2002; Laurentin and Karlovsky 2006; Pandey et al. 2015; Parsaeian et al. 2011). This contradicts to the usual assumption that selection of genotypes from different geographical origin maximizes the chance of producing segregation leading to diversity. Analysis of genetic diversity of 36 sesame accessions using 10 SSR markers revealed nine groups with the dissimilarity measures varying from 2.49 to 9.0, suggesting a narrow genetic basis among the evaluated 36 accessions (ARAÚJO et al. 2019). In a very recent study, 22 sesame germplasm accessions collected from two ecological zones of Nigeria were studied using 12 SSR markers (ODUOYE et al. 2020) (Table 4). This study revealed three major clusters and one outlier, where Savana was a more diverse ecological zone with 13 accessions and the humid forest was less diverse with nine accessions. More recently, 100 sesame accessions collected from Tigray, Afar, Amhara, Gambela, and Oromiya regions of Ethiopia were studied using 27 SSR markers, of which 63 accessions were allocated to the four populations, whereas 37 accessions were admixtures with no specific membership (Teklu et al. 2021). Their findings have reported various population clusters: population I (24 accessions), population II (13 accessions), population III (nine accessions), and population IV (17 accessions).

Table 4 List of functional online databases for sesame genomics

Besides to the well-established application of SSR markers for the study of genetic diversity, they represent a suitable type marker for the construction of genetic map and development of databases to search for sesame SSR information at the whole-genome level and perform functional analysis (Tables 4 and 5). Thus, SSR markers have been employed for the development of freely available and accessible web-based databases, such as Sinbase that provides a useful search function and data mining tools (Wang et al. 2015), SisatBase that gives free access to microsatellite marker data and a unified platform for functional analysis (Dossa et al. 2017b), PMDBase that facilitates researchers to identify microsatellite (Du et al. 2020). These functional databases would assist the beneficial resources for genomic studies, genetic assessment, and breeding advancement in sesame. Moreover, based on an F2 segregating population of an intra-specific cross between two cultivars, the first sesame linkage map was constructed comprising 220 molecular markers with a total genetic length of 936.72 cM, and average distance of 4.93 cM between markers on 30 LGs (Wei et al. 2009). This has opened a door for the construction of other linkage maps and increase its density using different types of markers. As a result, a high-density linkage map containing 653 (30 SSR, SNP, 50 AFLP and 573 RSAMPL) marker loci distributed on 14 LGs covering a total length of 1,216 cM and 1.86 cM between adjacent markers was constructed and mapped a QTL for seed coat color (Zhang et al. 2013c). These findings provided a foundation for future improvement of sesame MAB.

Table 5 Summary of genetic and association mapping studies for different traits of sesame

Inter-simple sequence repeat (ISSR)

ISSR markers have been established and employed mainly for genetic diversity studies of sesame during the last couples of decades. About 75 Sesamum indicum accessions obtained from divers areas in Korea (26 cultivars, 17 breeding lines and landraces, and 32 introductions from different countries across the world) were characterized using ISSR primers and clustered into seven groups based on unweighted pair‐group method with arithmetic averages (UPGMA) cluster analysis (Kim et al. 2002). Another ISSR-based dendrogram analysis classified 16 sesame genotypes collected from different geographical locations of India into two clusters, showing maximum discrimination power of the marker for the characterization of sesame genotypes (Sharma et al. 2009). In another study, 20 commercially grown sesame genotypes from various agro-climatic regions of India were grouped into five major clusters using 25 ISSR primers with an average similarity of 0.78 as a cutoff value, suggesting these markers provide a powerful tool for genotype identification and establishment of genetic relationships among sesame genotypes (Kumar and Sharma 2011). In a different study, a set of 34 confirmed polymorphic ISSR primers were reported to cluster 94 accessions of sesame from 25 countries into six major groups based on their geographical origin, viz., India sub-continent (50), Central Asia (11), America (12), East Asia (eight), Europe (two), and Africa (11) (Kumar et al. 2012). Besides, genetic diversity study of 18 sesame genotypes obtained from various regions of Iran and six accessions of sesame from other Asian countries revealed a cluster of seven groups using ISSR primers (Parsaeian et al. 2011). According to this study, genetic diversity-based selection of parental lines for hybridization is better than their geographical distribution.

In Ethiopia, UPGMA cluster analysis of 6 varieties and farmers’ landraces from north-western Ethiopia grouped each individual of the respective populations in separate groups using four ISSR primers (Admas Alemu and Tesfaye 2013). Similarly, the genetic diversity study of 128 genotypes of sesame (comprising 119 land races and nine commercial varieties) collected from 20 Administrative Regions and Zones of Ethiopia resulted three major groups sub-grouped into nine clusters using 17 ISSR primers (Abate et al. 2015). In this study, most of the landraces were grouped in separate clusters irrespective of their geographic origins, while the cultivars were grouped in one cluster, suggesting less variability within the released varieties than the landraces. Another study also revealed 120 (38 exotic and 82 Ethiopian) sesame accessions grouped into two main clusters and three outliers using six ISSR primers (Woldesenbet et al. 2015b). This study showed weak association between ecological region and genetic variation of sesame genotypes, and less diversity of exotic accessions than Ethiopian ones. In addition, 10 genotypes of sesame obtained from various regions of the country were characterized using 11 ISSR primers and clustered into four groups (Gebru et al. 2019). More recently, 31 genotypes of sesame collected from various locations of Tadla area in Morocco were characterized using 24 ISSR primers and grouped into three large groups and seven independent branches (El Harfi et al. 2021). This result indicated that Moroccan sesame populations are characterized by a low genetic diversity. This indicates the need to either create a new germplasm through mutagenesis breeding or introduce from diverse geographical origins in order to broaden the existing genetic variability. More recently, genetic diversity of various cultivated sesame genotypes was studied in Saudi Arabia using karyological, physiological, biochemical and molecular markers and were grouped into two main clusters (ALshamrani et al. 2022).

Expressed sequence tags-SSR (EST-SSR)

Advancement of EST-SSR markers using publicly available sesame EST data has speeded up the application of molecular markers in sesame improvement (Table 3). In this regard, Illumina paired-end sequencing identified 148 SSR sequences consisting of 155 EST-SSRs from a total of 1785 non-redundant EST sets assembled among 3328 identified sesame ESTs (Li-Bin et al. 2008). This study revealed 27 (61.4%) out of 44 successfully amplified EST-SSRs to be polymorphic in 36 globally collected sesame accessions and classified them into four groups based on UPGMA dendrogram analysis. Similarly, Illumina paired-end sequenced transcriptomes from five sesame tissues converted a total of 7702 uni-genes into SSR markers (EST-SSR) dominant by dinucleotide SSR repeat motifs (67.07%) (Wei et al. 2011). In this study, 40 primer pairs out of randomly selected 50 EST-SSRs amplified DNA fragments successfully and detected high level of polymorphism among 24 sesame accessions. In another study, 20 EST-derived SSRs were used to study the diversity of 49 sesame accessions and grouped them into five separate clusters exhibiting a genetic similarity coefficient ranging from 0.59 to 1.0 (Yepuri et al. 2013). These identified EST-derived SSR markers could be valuable in assessing the diversity of sesame accessions and in identifying various parents for sesame improvement programs. Moreover, 60 globally collected genotypes were evaluated in India and clustered into two major groups using eight polymorphic EST-SSR markers, where genotypes belonging to the same geographical area did not always occupy the same cluster (Pandey et al. 2015). More recently, 35 sesame landraces collected from Bulgaria, Greece, Italy and several Asian countries were studied using seven EST-SSR markers and clustered into three clusters: cluster I (12 cultivars), cluster II (9 cultivars) and cluster III (16 cultivars) (Stavridou et al. 2021). The molecular and morphological analysis in this study clearly showed that sesame cultivars collected from similar geographical origin were not always clustered in the same group, making heterotic clusters: nevertheless, grouping patters were detected for the Greek landraces. These EST-derived SSRs could be valuable to study the relationship among sesame accessions and help identify distinct parents for further sesame improvement programs.

Complementary DNA-SSR (cDNA-SSR)

Complementary DNA-SSRs are markers directly associated with transcribed genes and which give some fundamental advantages over genomic SSRs in lowering cost of development and significant level of transferability to related species (Wang et al. 2010). Hence, 59 cDNA-SSR markers were developed based on unigenes to genotypes to investigate the genetic variation among 36 sesame accessions (Wang et al. 2012a). Such characterized and identified polymorphic cDNA-SSR markers will greatly facilitate genetic structure, conservation and identification of sesame.

Chloroplast SSR (cpSSR)

Chloroplast-SSRs are newly developed markers, which have become a marker of choice for the study of genetic diversity in several crops including sesame. In sesame, very few genetic studies have been reported using cpSSRs (Table 3). Seven large repeated loci and 28 different SSR loci were detected in the chloroplast genome of S. indicum, of which 21 loci were homopolymers, four di-polymers, and three tri-polymers (Yi and Kim 2012). This study also identified seven large repeated loci in the Sesamum indicum chloroplast genome which are useful for developing sesame-specific chloroplast genome vectors. From 121 Ugandan accessions of sesame, four haplotypes were developed, of which 96% of the individuals belonged to haplotype B (Sehr et al. 2016). In addition, their findings reported 117 sesame landraces grouped in 13 districts based on nine SSR markers.

Recently developed molecular markers

Insertions and deletions (InDels)

InDels have been developed and used in sesame genetic analysis for their high abundance, high polymorphism, and reproducibility in the sesame genome. Accordingly, a comprehensive characterization of molecular and genetic variability of 130 sesame accessions from Chinese was evaluated using 88 polymorphic markers and revealed two main groups with five subgroups (Wu et al. 2014b). The rate of polymorphism (48.0%) for the InDels in this study exhibited better capacity to discern genetic diversity that contradicts to the work of (Wang et al. 2012a), 3.8%. The authors also developed 79 InDels from the cultivar Zhongzhi 14 using reduced-representation sequencing method. A different study reported the detection of 17,961 small InDels using a parallel re-sequencing of 29 strains of sesame obtained from 12 different countries and found consistent phylogenetic relationships among the Indian sesame accessions (Wang et al. 2014a). This study provides an insight into genetic dissimilarity of the sesame germplasm and facilitates a broad range of evolutionary and functional studies, such as population genetics, genome evolution, gene identification, and marker-assisted breeding. In addition, 45 InDels in combination with 259 SNPs and 124 SSR markers constructed a new genetic map spanning 1790.08 cM using 107 F2 lines derived from Zhongzhi No. 13 (ZZM2289) population (Wang et al. 2014b). These authors suggested an association of high genetic variation between lipid-related genes and oil content from the re-sequenced data of 29 sesame accessions collected from 12 countries. Recently, the first successful 7477 InDel sites were discovered using ddRADSeq data from 95 accessions belonging to Mediterranean sesame core collection (Kizil et al. 2020). The InDel frequency (1 per 37.74 kb) in this study was much higher than that of (Wei et al. 2014a), 1 per 137 kb, who used transcriptome assembly for InDel detection. Moreover, more InDels (7477) were found in this study as compared to the work of (Wu et al. 2014b), 75 genomic InDel markers, who used restriction site-associated DNA (RAD) sequencing of ‘Miaoqianzhima’ and ‘Zhongzhi 14’. This study also revealed no clear pattern with respect to the geographical origin, which is in agreement with the work of (Laurentin and Karlovsky 2006), where no association was reported between accession origin and genetic differentiation in sesame. These differences, hence, could be a consequence of different factors, such as the number of genotypes used for genotyping, the sequencing method, and the bioinformatics tools for the exploration of variants.

Restriction site-associated DNA sequencing (RAD-Seq)

Next-generation sequencing (NGS) techniques have recently been used for genome sequencing due to fast and cost-efficient development of sesame-specific DNA markers. RAD-seq is used for sequencing genomic regions flanking specific restriction enzyme sites and develop a reduced representation of genome (Wang et al. 2012b). Besides, RAD-seq is used for genotyping and discovery of SNPs in sesame (Wang et al. 2016; Wu et al. 2014b). With this regard, RAD-seq has developed 3,804 pairs of new DNA markers including SNPs (3769) and InDels (35), which were used to construct the most high-density genetic map in combination with SSR markers using a RIL population derived from a cross between ‘Zhongzhi 14’ and ‘Miaoqianzhima’ sesame genotypes (Wu et al. 2014b). This map was about 844.46 cM in length containing 1,230 markers (18 InDels, 22 SSRs and 1190 SNPs) distributed on 14 LGs with 0.69 cM average distance between adjacent markers. This study also identified a total of 30 QTLs for seven grain yield-related traits and was the first QTL map for yield-related traits using a RIL population. By adding a new RAD-seq, the first bin map was generated containing 1522 bins covering a genetic distance of 1090.99 cM with 0.72 cM between adjacent bins on 13 LGs and identified 50 QTLs: 9 for seed coat color and 41 for sesame plant height (Wang et al. 2016). This was the first genetic map reported in sesame to have the same number of linkage groups as the number of chromosomes in sesame. The bin distance (0.72 cM) reported in this study was in line to that of (Wu et al. 2014b), 0.69 cM generated by RAD-seq, but shorter than the maps constructed using randomly selective amplification markers (1.86 cM) (Zhang et al. 2013c) or the SLAF-seq technology (1.20 cM) (Zhang et al. 2013d). Moreover, more than 99% of the interval distances between adjacent bins were shorter than 6 cM, representing high density of the new sesame genetic map. The aforementioned findings reported high-density genetic map that would be an ideal for comparative genomic analysis, map-based cloning for yield-related traits, QTL/gene mapping and applying marker-assisted selection (MAS) for sesame genetic improvement.

Specific length amplified fragment sequencing (SLAF-Seq)

SLAF-seq is a recently developed high-resolution strategy used for large-scale genotyping and discovery of new SNPs (Table 5). SLAF-seq was employed to detect a total of 71,793 high-quality SLAFs, of which 1272 of the 3673 polymorphic SLAFs were used to construct sesame genetic map (Zhang et al. 2013d). This study developed a dense genetic linkage map of sesame containing 1,233 markers on 15 LGs spanning a distance of 1474.87 cM with 1.20 cM between adjacent markers using SLAF-seq, of which 1079 were SNP_only, 86 were InDels_only, and 68 were SNP & InDel markers. In addition, the study reported segregation distortion of 205 markers including 171 ‘SNP_only’-type markers, 20 ‘Indels_only’-type markers, and 14 ‘SNP & Indels’ markers with percentages of 83.41%, 9.76%, and 6.83%, respectively. Another study generated a sequence data of 81.2 Gb containing 902.36 million per end reads using SLAF-seq on 366 germplasm of sesame accessions collected from China (329) and 11 other countries (37) (Cui et al. 2017). Similarly, a high-resolution genetic map containing 9,378 SLAF on 13 LGs was constructed, spanning 1974.23 cM in length with mean length of 151.86 cM and distance of 0.22 cM between adjacent markers using SLAF-seq (Mei et al. 2017). The authors reported mapping of genes for basal branching habit (SiBH) and flowers per leaf axil (SiFA) on LG 5 and LG11, respectively and suggested their finding to serve as a reference linkage map to anchor sequence scaffolds of the physical map.

More recently, a SLAF-seq constructed a high-density SNP linkage map of 2128.51 cM in length, comprising 2159 SNPs distributed on 13 LGs with an average distance of 0.99 cM between adjacent markers (Du et al. 2019). This genetic map contained the same number of linkage groups as the number sesame chromosomes which agrees to (Wang et al. 2016) using RAD-seq. The authors reported that this was the first QTL map for seed-related traits in sesame using an F2 population, revealing 17 important seed-related traits and 19 major effect QTLs (eight QTLs for seed coat color, nine QTLs for seed size and two QTLs for thousand seed weight).

RNA- sequencing (RNA-Seq)

The advancement of largescale RNA-seq has led to generate large expression datasets in non-model species due to its high speed, cost-effectiveness and reliability (Marioni et al. 2008). In a comprehensive transcriptome study of sesame samples from different developmental and growth stages, RNA-seq identified 2164 genic-SSR markers in 4440 SSR-containing sequences (Zhang et al. 2012a). Out of the identified SSR markers in this study, primer pairs of 300 SSRs were selected randomly and validated in 25 accessions of sesame, 92.0% (276) of which successfully yielded PCR amplification. Besides, the genetic similarity coefficients of this study showed low correlation between geographical source and genotype as well as narrow genetic basis of sesame in China, which agrees with other findings reported using SSR markers (Wu et al. 2014b; Yue et al. 2012). Similarly, screening 29 sesame accessions of variable oil content from 12 countries generated a new high-density genetic map with more than 120 Gb clean data and identified 2,348,008 SNPs (Wang et al. 2014b). In a different study, a total of 7450 SNPs and 362 InDels were identified from a total of 33.47 Gbp from three sesame transcriptome datasets using RNA-seq (Wei et al. 2014a). The newly developed polymorphic markers (21 SNPs and 16 InDels) were used to evaluate the genetic diversity of 39 sesame cultivars and classify them into four groups with a similarity value of 0.636. These new genic-SSR and SNP/InDel markers will provide a useful tool for sesame genetic research and comparative genome analysis so as to increase the identification of candidate functional genes and the efficiency of MAS.

Whole-genome sequencing (WGS)

Advanced genetic and genomic research improvment in sesame is possible with understating of whole-genome sequencing technique. WGS has been widely applied in SNP discovery and genotypic studies in sesame (Table 4). Accordingly, potential candidate genes were identified for different desirable traits including oil quality and content, yield components, fatty acid composition, morphological characteristics, growth cycle, and coloration and disease resistance putative candidate genes (Wang et al. 2015; Wei et al. 2015; Zhang et al. 2016). These will help to undertake further research activities on functional genomics, such as genome editing using RNA interference (RNAi), clustered regularly interspaced short palindromic repeats)/Cas9 technology (CRISPR/Cas9), transcription activator-like effector nuclease (TALEN), and zinc finger nuclease (ZFNs), as well as genetic transformation. WGS of 29 sesame strains generated more than 120 Gb filtered data and identified an average of 127,347 SNPs (Wang et al. 2014b). This study resulted two major groups based on monoflower (13) and triple-flower (16), where 695 genes with equivalent SNPs between the two pools were identified. Besides, the authors expected the number of flowers per leaf axil to be determined by adenylate iso-pentenyl-transferase (ITP) genes in sesame, suggesting a novel role of the gene. Moreover, WGS has been used to develop various comprehensive and functional online user friendly databases, such as SesameFG (Wei et al. 2017b), SesameHapMap (Wei et al. 2015), Sinbase (Wang et al. 2014a), and the Sesame Genome Project (Zhang et al. 2013b), which could play vital roles to study functional and comparative genetic studies of desirable traits in sesame. Similarly, another freely online available database, ‘GinMicrosatDb’, has been developed using WGS data of ‘Swetha’ sesame variety consisting of microsatellites localized on both LGs and scaffold with their genomic co-ordinates (Purru et al. 2018). These databases facilitate selection of unlinked markers across linkage groups and ensured the random distribution by reducing the biases in estimation of genetic diversity and relatedness of genotypes in sesame. Overall, such freely online available databases can be easily accessed by researcher and prove useful search function and mining tools including genome browser and local BLAST service.

With regard to SNP-based genetic diversity analysis in sesame, a large number of high-throughput SNP markers have been discovered using recently developed molecular markers (Table 6) including 362 InDels and 7450 SNPs using RNA-seq (Wei et al. 2014a), 15,693 SNPs using RAD-seq (Wang et al. 2016; Wu et al. 2014a), more than 15,000 SNPs using GBS (Uncu et al. 2016), and 93,053 SNPs using SLAF-seq (Cui et al. 2017; Du et al. 2019; Mei et al. 2017) from various datasets and varieties. Markers based on high-density SNPs are highly favored in the study of genetic diversity and population structure, high-density genetic linkage map construction and genome-wide association study of sesame accessions (Table 5). SNP genotyping is a useful tool for mapping QTLs/genes, map-based cloning and MAS in several crops (Hayashi et al. 2006). The genetic structure of 366 sesame germplasm accessions collected from 18 provinces of China and other 11 countries of the world were clustered into three sub-groups (Pop1, Pop2 and mixed) using 89,924 SNPs (Cui et al. 2017). This clustering was primarily along the bases of their geographic collection, which is in contradiction to majority of other findings (Asekova et al. 2018; Bhattacharjee et al. 2019, 2020; Kim et al. 2002; Laurentin and Karlovsky 2006; Pandey et al. 2015; Parsaeian et al. 2011). In another study, combinations of different markers were reported to provide a useful protocol for the identification and authentication of various sesame cultivars (Wei et al. 2017a). In their study, a combination of 140 polymorphic markers (47 SNPs, 47 InDels and 46 SSRs) were employed for DNA fingerprinting of 151 Chinese sesame cultivars and classified them into three main groups. This revealed a narrow range of genetic variation among the cultivars and were consistent with their geographical distribution (Wei et al. 2017a), which agrees with other findings based on using SSRs (Wu et al. 2014b; Yue et al. 2012) and RNA-seq (Zhang et al. 2012a). Recently, 198 sesame accessions from 21 geographical regions spread over four different continents were analyzed using 5292 high-quality SNPs and were clustered into three sub-groups (Basak et al. 2019). This study showed about 5.53% genetic variability among continental groups of the total variation, indicating highly diversification of the Mediterranean sesame core collections.

Table 6 Summary of SNP marker-based population structure and genetic diversity analysis in sesame

In a high-throughput SNP identification and genotypic study, a total of 15,521 SNPs were identified from 91 sesame RILs derived from a cross Acc. No. 95-223 (Africa) × Acc. No. 92-3091 (Korea), 14,786 SNPs (95.26%) of which were located along sesame genome assembly pseudomolecules (Uncu et al. 2016). Consequently, 770 SNPs selected through a high stringency filtering protocol in conjunction with 50 SSRs were used in linkage analysis, and resulted in 13 LGs encompassing a total genetic distance of 914 cM with 432 markers (420 SNPs and 12 SSRs). Besides, an ultra-dense SNP map comprising 30,193 SNPs in 13 LGs with an average marker density of 0.10 cM was constructed by resequencing 120 progeny of an F2 population derived from the cross Yuzhi 11 (indeterminate) × Yuzhi DS899 (determinate) (Zhang et al. 2016). This study identified the target SNP (SiDt27-1) and the determinacy gene (DS899s00170.023SiDt) in Scaffold 00170 of the Yuzhi 11 reference genome. Another study also reported a high-density bin map with 1086.403 cM in length, containing 13,679 SNPs with an average distance of 0.918 cM between adjacent bins using GBS in an F2 sesame population (Zhang et al. 2018). This study reported a novel sesame determinacy gene mapped on LG09 in a genome region of 41 kb which could be a base for developing of sesame cultivars suited for mechanization. More recently, 300 genotypes comprising 75 exotic collections from different Asian and African countries, 209 Ethiopian landraces, and 16 varieties were studied using Diversity Arrays Technology (DArT) and SNP markers (Tesfaye et al. 2022). In this study, a total of 6474 SNP markers and 6115 silicoDArT markers were discovered, of which 5821 and 5065 markers of SNP and DArT markers, respectively were aligned with the reference sesame genome.

Concluding remarks and future perspectives

Sesame is one of the oldest oilseed crops widely cultivated in the tropics and subtropics of the world. The production and productivity of sesame are, however, highly affected by both biotic and abiotic stresses. So far, traditional crop improvement approaches have been employed in sesame improvement for many years to produce new genotypes with desired traits, but are very slow and laborious. To minimize the problems of traditional breeding, several advanced molecular markers, such as RAPD, AFLP, SSR, ISSR, EST-SSR, cDNA-SSR, cpDNA, InDel, RAD-seq, SLAF-seq, RNA-seq, and WGS, have been developed and made available genetic improvement of sesame. Such molecular markers are valuable and provide a foundation for cultivar identification and conservation of the species, genetic diversity studies, linkage mapping, qualitative and quantitative trait mapping, whole-genome sequence data, transcriptomes-based research, marker-assisted breeding and transferability studies among sesame accessions. This review presented the application of molecular markers on population structure and genetic diversity studies as well as construction of genetic linkage maps and the discovery of molecular markers for freely available online databases using different approaches. Understanding the population structure and diversity of sesame collection is essential for efficient management of genetic resources to improve sesame breeding. Availability of a dense genetic map would also be useful for fine mapping of genes/QTLs, map-based gene identification, and advanced molecular breeding for sesame and would act as a reference for positioning sequence scaffolds on a physical map, thereby assisting the assembly of sesame genome sequences. Moreover, the freely online available systematic data bases and integrative analytical web services help to enhance molecular markers application for genome evolution and structure variants, gene regulation, and future advanced sesame breeding. Hence, the following are future prospective for further sesame improvement:

  1. 1.

    Africa is the second largest producer of sesame next to Asia but the genetic diversities of the African sesame accessions are not well studied. There is, hence, a need to conduct further genetic diversity studies using recently developed molecular markers in the major sesame producing African countries including Ethiopia, Nigeria, South Sudan, Sudan, Uganda, and the United Republic of Tanzania.

  2. 2.

    Collecting cultivars in various growing areas of African countries with the intention of establishing sesame gene banks are urgently needed.

  3. 3.

    Locally adapted and wild species of sesame are rich in genomic resources which play a significant role for developing resistant genes to different stresses and give high genetic variation. Wild species are, henceforth, important to contribute required genes when crossed with the available elite sesame genotypes in order to improve their resistance to the different biotic and abiotic stresses and their adaptability to various environmental conditions. Therefore, there is an urgent need to undertake an intensive marker-assisted breeding to unravel potential candidate genes associated to abiotic and biotic stresses in sesame.

  4. 4.

    In this comprehensive review, the progresses of various molecular markers including RAPD, AFLP, SSR, ISSR, EST-SSR, cDNA-SSR, and cpSSR have been comprehensively reviewed. Besides, recently developed markers, such as InDels, RAD-seq, SLAF-seq, RNA-seq, WGS, and SNPs, have been comprehensively reviewed. Therefore, special attentions should be given on these research findings for sesame improvement through high-density genetic map construction, comparative genomic analysis, map-based cloning for yield-related traits, QTL/gene mapping and marker-assisted selection (MAS). Moreover, the use of different molecular markers related to different traits, such as yield and yield-related, branching habit, determinate growth, non-shattering, oil content and quality, tolerance to waterlogging and drought stresses, and resistance to various diseases and insect/pests, should be applied to get high specificity in crop improvement programs.

  5. 5.

    Production and productivity of African sesame especially Ethiopian sesame genotypes are very low due to low genetic potential, biotic and abiotic factors. To improve sesame production it is, therefore, very important to know the ideal environmental conditions, sesame phenotype, and exploiting sesame diversity. In addition, an intensive research work is needed on phenotype interaction with environment to improve sesame production.

  6. 6.

    Sesame genetic resources characterization is considered as a prerequisite of breeding activities. Therefore, sesame genetic resources collection, characterization, and conservation is urgently demanding to discover new variations for farther breeding programs.

  7. 7.

    Ethiopian sesame accessions are less characterized and preserved as compared to other top sesame producing countries. As a result, the country is getting low yield which is about 5 Qt/ha. Hence, special attention should be given to omics technologies to enhance production and productivity of Ethiopian sesame accessions. In addition, shattering is the major yield reducing factor in Ethiopian sesame. Therefore, applying genome editing and genetic transformation to solve this problem is of a paramount important.

  8. 8.

    Root system architecture (RSA) of sesame is very important as it plays a significant role in adapting various climatic conditions. However, no study has been conducted in RSA of sesame so an intensive research on RSA is urgently demanding.