Introduction

The identification of crop cultivars holds paramount importance in safeguarding and harnessing biodiversity; however, this process might encounter delays stemming from insufficient taxonomic expertise (Chase and Fay 2009; Pathirana and Carimi 2022). In addition to whole-plant identification, there are instances where, identifying a cultivar from roots, seeds, pollen, or even from plant mixtures sampled across diverse ecosystems proves valuable. Nonetheless, conventional morphological methods may prove intricate or unfeasible in such scenarios (CBOL plant working group 2009).

Root and bulb vegetables (RBV) are often considered as orphan crops— playing a vital role in regional food security— majorly belonging to the families Apiaceae and Amaryllidaceae (formerly Alliaceae) (Bhasi et al. 2010). Alliaceae consists of common food crops like Onion, Garlic and Leek which are used in large volume in many parts of the world. However, the number of dedicated Plant Scientists focusing on these RBV crops remains limited – potentially as few as 10–25 full-time equivalent scientists engaged in academic and government initiatives worldwide. Much of the research effort emanates from breeders, geneticists, taxonomists, plant pathologists, and plant physiologists. A comparable level of dedication is also exhibited by seed companies globally (Brooks and Vest 1985; Frey 1999). Despite the relatively modest size of the scientific community, there has been a remarkable accretion of genomic information. This is notably due to endeavours aimed at expanding molecular genetic maps, elucidating taxonomic relationships, and investigating molecular aspects of gene expression. This compact RBV scientific community is getting more curious as the recent outcomings of genomic data from different domains of plant science facilitating the data accessibility (Bhasi et al. 2010).

Historically, systematic and phylogenetic analyses of these pungent plants primarily relied on morphological traits (Gurcharan 2004). However, the past few decades have witnessed constant breakthroughs in molecular biology, particularly genetics, presenting a new array of tools suitable for unravelling their relationships and elucidating the unique chemical profiles of associated cultivars (Gounaris et al. 2002; Labra et al. 2004). There were several modern molecular techniques employed for the assessment of genomic profiling of Allium species. Some of the significant markers include Expressed Sequence Tags (ESTs) using Cleavage amplified polymorphisms (CAPs) and Single-Stranded Conformation polymorphisms (SSCP) markers (McCallum et al. 2001), Intron Length Polymorphic (ILP) markers (Jayaswall et al. 2019; Khade et al. 2022), Potential Intron Polymorphism (PIP) markers (Jayaswall et al. 2024), R-gene derived molecular markers (Herlina et al. 2019) and Single-nucleotide polymorphism (SNP) markers (Fujito et al. 2021).

Discrete markers like Random Amplified Polymorphic DNA (RAPD), Amplified Fragment Length Polymorphism (AFLP), and hypervariable DNA regions (such as SSRs in the microsatellite regions) have commonly been employed as molecular methodologies for such investigations (Friesen and Klaas 1998; Labra et al. 2004; Trindade 2007; Baldwin et al. 2012; Mallor et al. 2014; Jayaswall et al. 2022; Raj et al. 2022; Chalbi et al. 2023). Still — despite their demonstrated effectiveness — these markers face limitations in discriminating between diverse species or cultivars (Trindade 2007; Azizi et al. 2009). Moreover, their applicability across a broad spectrum of taxa is often constrained, as they have evolved within specific genera or species (Novak 2008; Segarra-Moragues and Gleiser 2009). However, homozygosity must be factor to be considered while genotypic classification especially in the in-bred cultivars invariant to the molecular markers (Moon et al. 2023).

The progress in sequencing and computational technologies has elevated DNA sequences to a prime source of innovative insights, enhancing our understanding of evolutionary and genetic relationships. The impacts of sequence analysis are now noticeable across nearly all domains of Biological Sciences, spanning from developmental research to epidemiology (Tibayrenc 2005). However, two distinct branches of biology have pioneered the tools and applications used to explore biological relationships through DNA sequences: molecular phylogenetics and population genetics. These fields address different tiers of organizational complexity. Molecular phylogenetics traditionally delves into evolutionary relationships among broader clades, whereas population genetics focuses on variations within and among populations of individual species. In contrast, DNA barcoding occupies an intermediary role, aiming for comprehensive species coverage while emphasizing their identification rather than relational aspects.

DNA barcoding represents a relatively recent technique that has been developed to offer swift, accurate, and automated species identification by utilizing standardized DNA sequences as tags (Hebert et al. 2003; Taberlet 2007). The origins of this approach trace back to the seminal work of Hebert et al. (2003), who demonstrated that a collection of 200 closely related lepidopteran species could be distinguished with 100% accuracy using the mitochondrial gene cytochrome c oxidase subunit I (COI). While COI proved less effective in plants, several other genetic loci have been proposed as potential plant barcodes, including Internal Transcribed Spacer (ITS) (Kress et al. 2005; Chase et al. 2007), rbcL (Newmaster et al. 2006; Kress and Erickson 2007; Hollingsworth et al. 2009; Anvarkhah et al. 2013), psbA-trnH (Chase et al. 2007; Kress and Erickson 2007; Lahaye et al. 2008; Chen et al. 2010; Gao et al. 2010; Fu et al. 2011), and matK (Chase et al. 2007; Hollingsworth et al. 2009; Zarei et al. 2020).

In the present context, barcoding has evolved into a dependable technique for species identification (Vijayan and Tsou 2010; Singh et al. 2021). The fundamental principle underlying barcoding involves comparing sequence data from an unknown sample (the specimen under study) to a reference sequence obtained from a voucher specimen. The barcode sequence of each unknown specimen is then matched against a reference barcode sequence library, established from individuals with known attributes. A species is confirmed if its sequence closely corresponds to one within the barcode library. Alternatively, novel documentation might lead to the proposal of a new barcode sequence for a known species, or it may even contribute to the recognition of a previously undiscovered species (Hajibabaei et al. 2007).

The assessment of genetic diversity through molecular markers plays a pivotal role in comprehending genome structure, characterizing and preserving genetic variations within plant germplasm, pinpointing genes linked to significant traits, and formulating effective breeding strategies for crop enhancement (Hayden et al. 2010). The utilization of markers and the recognition of polymorphic nucleotide sequences dispersed across the genome have opened up fresh avenues for appraising diversity and discerning inter- and intra-species genetic relationships (Gostimsky et al. 2005).

Numerous molecular markers are at our disposal for probing genetic diversity. Among these, SSR (Tautz 1989; Becker and Heun 1994), RAPD (Williams et al. 1990), AFLP (Vos 1995), and inter simple sequence repeat( (ISSR) (Zietkiewciz et al. 1994) have stood out as the most influential. However, these methods have faced significant limitations, such as the RAPD’s poor reproducibility, the high costs associated with AFLP, and the necessity to possess flanking sequences for designing specific primers in the case of SSR markers. In comparison, ISSR markers have successfully surmounted many of these challenges (Reddy et al. 2002). They offer the distinct advantages of being relatively cost-effective, showcasing high levels of polymorphism, and exhibiting strong reproducibility (Peng et al. 2006).

The ISSR, representing a relatively recent category of molecular markers, relies on the presence of short DNA sequences organized in tandem repeats. Notably, these inter-repeat regions display substantial polymorphism in their sizes, even among closely related genotypes, owing to the absence of evolutionary functional constraints within these non-functional domains (Rizkalla et al. 2012).

The objective of this study is to elucidate the relationships among three distinct cultivars of A. cepa (onion) through multiple approaches. These include analyzing ITS sequences, employing DNA barcodes through primers like ITS, matK, rbcL, trnH-psbA, and trnL, as well as incorporating seven ISSR markers.

Materials and methods

To assess the appropriate degree of sequence divergence within the plant genome, Allium cepa specimens collected from three different cultivation sites: Surandai (BDUT 1453) and Alankulam (BDUT 1454) in the Tirunelveli District, and Vilathikulam (BDUT 1455) in the Tuticorin District of Tamil Nadu, India. All three specimens, representing traditional cultivars of their respective localities, were cultivated in sites closely resembling their natural habitats. Specimens were collected from the field with due permissions from the farmers. For genetic analysis, fresh young roots (each weighing 200 mg) were obtained from the selected Allium plants. DNA isolation was conducted by employing the CTAB method (Doyle and Doyle 1987). To ensure DNA of high quality, the isolated genetic material underwent purification utilizing a spin column kit. Subsequently, the concentration of purified DNA for each sample was quantified based on the intensity of ethidium bromide-stained bands.

To compare the performance of various DNA markers, each sample underwent analysis using both ITS and four candidate DNA barcoding genomic regions. These include five DNA loci amplified with universal primers: Internal Transcribed Spacer (ITS) gene (White et al. 1990), rbcL gene (Bafeel et al. 2011), matK gene (Costion et al. 2011), PsbA gene (Jabbes et al. 2011), and trnL gene amplifications (Table 1).

Table 1 Primers used in barcoding and sequencing of Allium cepa L.

For ISSR analysis, primers were constructed according to the method outlined by Jabbes et al. (2011). Seven distinct ISSR primers were employed for the amplification of the cultivars: ISSR8US, ISSR9, ISHY 1b, ISHY 2, ISHY 3, ISHY 4, and ISSR a (Table 2). Consider that the annealing temperature varied for each specific primer (Jabbes et al. 2011). Each PCR reagent mixture comprised 10 µl of Taq pre-mix, 4 µl of water, 1.5 µl of the forward primer, 1.5 µl of the reverse primer, and 3 µl of DNA.

The PCR Thermal Cycler Program was performed using the Eppendorf ProS, Hamburg, Germany. For DNA barcoding and ITS markers, the PCR thermal cycle involved one initial cycle of 5 min at 94 °C, followed by 35 cycles comprising 30 s at 94 °C, 30 s at 58 °C, and 60 s at 72 °C, with a final cycle of 10 min at 72 °C.

Regarding the PCR programs for amplifying ISSR markers, 40 cycles of denaturation at 94 °C for 30 s, annealing at a temperature determined by the specific primer used (Table 2), followed by a 2 min amplification step at 72 °C. The final amplification set with the last cycle for 10 min at 72 °C. Subsequently, all PCR products were subjected to electrophoresis on a 1.2% agarose gel in 1X TAE buffer at 60 V for a duration of 1.30 h, and visualized using a UV transilluminator. Additionally, DNA barcoding and ITS PCR products underwent purification prior to sequencing. The Sanger dideoxy method was employed for DNA sequencing of the PCR products. The generated data were imported and aligned using Molecular Evolutionary Genetics Analysis (MEGA v5.2.2).

Table 2 Characteristics of inter simple sequence repeat (ISSR) primers

AF-Total amplified fragments; PF- Number of polymorphic amplicons; % P- Percentage of polymorphism.

Basic sequence statistics, like nucleotide frequencies, the transition/transversion (ns/nv) ratio, and variability within distinct sequence regions, were computed using the MEGA software. The sequence data were subjected to analysis through both phenetic and cladistic methods. The phenetic approach involved employing the neighbor-joining method (NJ), while the cladistic method utilized the maximum parsimony method (MP).

The ISSR marker index was calculated to assess the efficacy of each primer in identifying polymorphic loci (P) within the cultivars. Subsequently, the computation of Shannon index (I) (Lewontin 1972) and Nei’s standard genetic distance (D) (Nei 1972) took place, followed by the construction of a dendrogram using POPGENE v32 Software. For a Bayesian analysis of the ISSR data, the Structure v3.2.2 Software was utilized.

Results

The ITS region of A. cepa was analyzed to determine the phylogenetic relationships among the various cultivars of Allium species. The lengths of the ITS regions in the evaluated cultivars were 666 bp in BDUT 1453, 651 bp in BDUT 1454, and 663 bp in BDUT 1455. Similarly, for the matK region, all three cultivars shared different lengths: 684 bp in BDUT 1453, 818 bp in BDUT 1454, and 822 bp in BDUT 1455. Likewise, the trnH-psbA region were 646 bp in BDUT 1453, 653 bp in BDUT 1454, and 645 bp in BDUT 1455. In the same way, the rbcL region exhibited different lengths of 531 bp in BDUT 1453, 530 bp in BDUT 1454, and 534 bp in BDUT 1455. Remarkably, the trnL region demonstrated a smaller size compared to the other barcoding regions examined in this study, featuring a consistent length of 246 bp with all three cultivars.

Interestingly, the ITS phylogeny indicated that the three distinct Allium cultivars were failed to coalesce into a single clade. Employing the phenetic method for ITS sequence analysis, BDUT 1453 exhibited a close relationship with A. fistulosum gi338191570, while the cultivar BDUT 1454 formed an out-group, branching away from the cluster encompassing A. cepa gi21627887 to A. altaicum gi133919855. Conversely, the cultivar BDUT 1455 demonstrated a close relationship with A. altaicum gi259186340 (Fig. 1).

Fig. 1
figure 1

NJ (Phenetic method) tree based on Internal Transcribed Spacer (ITS) region of Allium species

Among the cultivars studied, BDUT 1453 and BDUT 1454 displayed a close relationship, evidenced by a pairwise distance of 5.308 between them. Both these cultivars had diverged from their near common ancestor, BDUT 1455. Notably, the pairwise distance between BDUT 1454 and 1455 was 6.22, while that between BDUT 1453 and BDUT 1455 was 8.44.

In the context of the cladistic method applied to ITS sequence analysis, the cultivar BDUT 1455 exhibited a pronounced affinity with A. altaicum gi259186340. Conversely, the cultivar BDUT 1454 diverged from the cluster that encompassed A. cepa gi444237454. Intriguingly, BDUT 1453 emerged from the cluster formed by A. fistulosum gi256596112 and A. cepa gi256596111 (Fig. 2). Remarkably, the cultivars clustered together, suggesting a significant genetic divergence, when subjected to both the neighbour-joining (NJ) and maximum parsimony (MP) methods.

Fig. 2
figure 2

MP (Cladistic method) tree based on Internal Transcribed Spacer (ITS) region of Allium species.

In this present study, the nucleotide frequencies calculated were as follows: A = 20.56%, T/U = 33.33%, C = 20.56%, and G = 25.56%. Both the neighbor-joining (NJ) and maximum parsimony (MP) methods highlighted substantial genetic divergence among the three tested onion cultivars of Allium.

The shape parameter for the distinct gamma distribution was estimated to be approximately 200.0000. Substitution patterns and rates were determined using the Tamura-Nei model (+ G) (Tamura and Nei 1993). To account for variations in evolutionary rates across sites, a discrete gamma distribution was applied with 5 categories (+ G). The mean evolutionary rates within these categories were identified as 0.90, 0.96, 1.00, 1.04, and 1.10 substitutions per site.

Likewise, the nucleotide frequencies were distributed as follows: A = 20.56%, T/U = 33.33%, C = 20.56%, and G = 25.56%. The computation resulted in a maximum Log likelihood value of − 245.115. Furthermore, the estimated Transition/Transversion bias (R) was calculated to be 0.33 (Kimura 1980). The nucleotide frequencies were characterized as follows: A = 25.00%, T/U = 25.00%, C = 25.00%, and G = 25.00%. The maximum Log likelihood value for this particular computation was − 249.497.

The phylogenetic relationships of barcoding gene sequences, matK, rbcL, trnH-psbA, and trnL, were determined through both phenetic and cladistic methods of phylogenetic analysis. During the matK analysis, the nucleotide frequencies were distributed as follows: A = 32.81%, T/U = 40.09%, C = 14.52%, and G = 12.58%. The corresponding computation yielded a maximum Log likelihood value of -2690.657. Additionally, the Transition/Transversion bias (R) was estimated to be 0.74, with the maximum Log likelihood value of − 2921.083.

During the matK region analysis, BDUT 1453 exhibited divergence from A. cyaneum gi379323965. BDUT 1454 diverged from a common ancestor shared by A. scorodoprasum var. viviparum gi519670102 and A. monanthum gi519670092. BDUT 1455’s evolutionary path led from a comprehensive cluster encompassing a majority of other Allium species (Fig. 3).

Fig. 3
figure 3

NJ (Phenetic method) tree based on matK region of Allium species

Similar to the ITS sequence analysis, the phylogenetic relationship among the three tested onion cultivars remained consistent. The pairwise distance between BDUT 1453 and 1454 was calculated at 3.74, while the distance between BDUT 1455 and 1454 was 6.42. The pairwise distance between BDUT 1453 and 1455 was 6.28.

In the MP method applied to matK sequence analysis, BDUT 1453 exhibited divergence from A. cepa gi387865328. The evolutionary trajectory of BDUT 1454 stemmed from a cluster containing A. oleraceum gi3790415782 and A. condensatum gi379323423. On the other hand, BDUT 1455 demonstrated divergence from a cluster featuring some Allium species, while also branching out from another cluster of different Allium species (Fig. 4). Notably, the shape parameter for the discrete Gamma Distribution was identical to the ITS sequence analysis. During this study, the analysis of matK using both the NJ and MP methods revealed a discrepancy in terms of species genetic divergence. Despite this difference, the phylogenetic trees generated by both methods showcased distinct topologies.

Fig. 4
figure 4

MP (Cladistic method) tree based on matK region of Allium species

Examining the rbcL region of the tested cultivars revealed that BDUT 1454 and BDUT 1455 shared a close relationship, forming a cohesive cluster in both methods of phylogenetic analysis. The calculated pairwise distance between them amounted to a mere 0.01.

The analysis of rbcL using NJ method revealed that BDUT 1454 and 1455 diverged from the common cultivar BDUT 1453 which was out grouped from A. cepa gi387865420 (Fig. 5). On the other hand, MP method produced a result where BDUT 1453 formed a cluster with A. cepa gi478430773, while the other two cultivars clustered together (Fig. 6). In terms of pairwise distances, the measurement between BDUT 1453 and 1455 amounted to 5.82, while the distance between BDUT 1453 and 1454 stood at 5.84.

Fig. 5
figure 5

NJ (Phenetic method) tree based on rbcL region of Allium species

Fig. 6
figure 6

MP (Cladistic method) tree based on rbcL region of Allium species

The shape parameter for the distinct gamma distribution was estimated as 47.6767. The nucleotide frequencies for the four bases were recorded as follows: A = 29.56%, T/U = 29.37%, C = 21.57%, and G = 19.50% with a corresponding maximum Log likelihood value of -1482.077. Moreover, the estimated Transition/Transversion bias (R) was calculated was 0.51, with the maximum Log likelihood value of − 1501.008.

During the trnH-psbA region analysis, BDUT 1454 diverged from BDUT 1455 and aligned with A. sikkimense gi379323442 to form a cluster. Meanwhile, the other cultivar, BDUT 1453, formed a cluster with A. carinatum gi406033480, exhibiting divergence in the NJ method (Fig. 7).

Fig. 7
figure 7

NJ (Phenetic method) tree based on trnH-psbA region of Allium species

In the MP method of analysis, BDUT 1453 and 1455 were observed to cluster together, exhibiting a pairwise distance of 3.05. This cluster diverged from the other Allium cultivar, which in turn formed a cluster with BDUT 1454 (Fig. 8). The calculated pairwise distances were 2.51 between BDUT 1453 and 1455, and 2.76 between BDUT 1453 and 1454. Through trnH-psbA sequence analysis, it became evident that all three onion cultivars displayed genetic divergence in both the neighbor-joining (NJ) and maximum parsimony (MP) methods. Notably, both BDUT 1454 and 1455 formed a unified cluster, stemming from the cultivar BDUT 1453.

Fig. 8
figure 8

MP (Cladistic method) tree based on trnH-psbA region of Allium species

In the context of trnH-psbA analysis, the maximum Log likelihood corresponding to the shape parameter for the discrete Gamma Distribution was − 73.982. For the trnH-psbA region, the nucleotide frequencies were as follows: A = 25.00%, T/U = 48.33%, C = 13.33%, and G = 13.33%. This calculation yielded a maximum Log likelihood value of − 73.982. Moreover, the estimated Transition/Transversion bias (R) was found to be 0.00, with the maximum Log likelihood value of − 81.567.

The analysis of the trnL sequence revealed that BDUT 1453 and 1454 clustered together, and both cultivars exhibited divergence from BDUT 1455 in both the NJ method (Fig. 9) and the MP method (Fig. 10) of phylogenetic analysis. Notably, the pairwise distance was calculated to be 0.012 between BDUT 1453 and 1455, 0.016 between BDUT 1453 and 1454, and 0.029 between BDUT 1454 and 1455. For the discrete Gamma Distribution, the estimated value of the shape parameter was − 366.703. In terms of nucleotide frequencies, the four bases were recorded as follows: A = 40.51%, T/U = 24.39%, C = 14.77%, and G = 20.33%. This calculation produced a maximum Log likelihood value of -366.703. Furthermore, the estimated Transition/Transversion bias (R) was found to be 0.75, with the maximum Log likelihood value of -385.227.

Fig. 9
figure 9

NJ (Phenetic method) tree based on trnL region of Allium species

Fig. 10
figure 10

MP (Cladistic method) tree based on trnL region of Allium species

A total of 115 scorable bands were generated among the cultivars using seven ISSR primers. The amplified products exhibited sizes ranging from approximately 100 to 1300 bp (Figs. 11 and 12). The scorable bands produced by each primer varied, ranging from 13 to 24 (Table 2). The overall count of polymorphic alleles amounted to 29.00%, and the percentage of polymorphism was recorded at 25.21%.

Fig. 11
figure 11

ISSR banding pattern of three onion accessions generated by ISSR primers Lanes from left to right; lane 1–1 Kb ladder. Lane 2, 3, 4 – ISSR8US banding pattern for BDUT 1453, 1454 and 1455 respectively. Lane 5, 6, 7 – ISSR 9 banding pattern for BDUT 1453, 1454 and 1455 respectively. Lane 8, 9, 10 – ISSR a banding pattern for BDUT 1453, 1454 and 1455 respectively.

Fig. 12
figure 12

ISSR banding pattern of three onion accessions generated by ISSR primers Lanes from left to right; lane 1–1 Kb ladder. Lane 2, 3, 4 – ISHY 1b banding pattern for BDUT 1453, 1454 and 1455 respectively. Lane 5, 6, 7 – ISHY 2 banding pattern for BDUT 1453, 1454 and 1455 respectively. Lane 8, 9, 10 – ISHY 3 banding pattern for BDUT 1453, 1454 and 1455 respectively. Lane 11, 12, 13 – ISHY 4 banding pattern for BDUT 1453, 1454 and 1455 respectively.

The observed number of alleles (na) was 2, with an effective number of alleles (ne) calculated as 1.6274. Nei’s genetic diversity (h) was determined to be 0.3718, while Shannon’s information index (I) was calculated as 0.5544. The values for Ht, Hs, Gst, and Nm were computed as 0.3718, 0.3314, 0.1085, and 4.1065, respectively. The average number of alleles per locus was found to be 2.3. The estimated In Probability of data was − 20.3, with a mean value of In-likelihood at -19.8 and a variance of In-likelihood as 1.1. The ISSR analysis showed that the mean values of Fst-1, Fst-2, and Fst-3 were calculated as 0.0415, 0.0166, and 0.0076, respectively. Additionally, the average distances of clusters 1, 2, and 3 were determined to be 0.6159, 0.6142, and 0.6136, respectively, as depicted in the bar plot (Fig. 13). Notably, the L (K) achieved by Structure demonstrated a clear distinction among the different cultivars.

Fig. 13
figure 13

Bayesian proportion of individual plants for a K = 3 population model. The population identified by the Structure Software was indicated in different colours

Discussion

To elucidate the phylogenetic relationship between A. cepa and other Allium species, a comparison was conducted using ITS and barcoding sequences. Both the phenetic and cladistic methods of phylogenetic analysis yielded distinct tree topologies for the ITS sequences. The ITS region has been extensively employed for phylogenetic investigations in A. cepa by various researchers (Dubouzet and Shinoda 1998, 1999; Mes et al. 1999; Friesen et al. 2000; Fritsch and Friesen 2002). In particular, Dubouzet and Shinoda (1999) proposed that DNA sequence analysis, specifically utilizing the ITS sequence, serves as a valuable tool for understanding the intragenic organization within the Allium genus.

The initial phase of this study aimed to establish the universality of the five candidate DNA markers. To achieve this, we assessed the DNA markers that were consistently amplified and sequenced in the largest number of analyzed samples. To enhance the clarity of interpreting the outcomes, only the most universally effective primer combination for each candidate DNA marker was examined. The ITS phylogeny results were showing that the three distinct Allium cultivars were failed to merge into a single clade. While studying the phylogenetic clustering patterns, both the NJ and MP methods adopted for ITS region analysis reveals a significant genetic divergence among three Allium clusters studied.

In a study conducted by Ipek et al. (2008), the ITS sequences of diverse Allium species were examined to elucidate the phylogenetic connection between A. tuncelianum and other Allium species. The investigation revealed that both the NJ dendrogram and the consensus tree resulting from parsimony analysis yielded same tree topologies. Intriguingly, both the analyses placed A. tuncelianum within the clade of subgenus Allium, along with Garlic. Previously, the monophyly of section Allium was confirmed by Hirschegger et al. (2010). Four main clades were identified on all ITS analyses. However, the interconnections among these clades and the remaining species within section Allium remained unresolved. The employment of cpDNA-based phylogenetic trees led to the identification of two major clades, though the resultant topology only partially correlated with that of the ITS tree. To trace the presumed parent species of polyploid taxa, a method involving intra-individual polymorphism of the ITS region was utilized. The phylogenetic relationships of the barcoding locus, reported by Da-Cruz (2012), introduced a degree of confusion due to the presence of similarities across different species.

Subsequently, Nguyen et al. (2008) developed a phylogenetic tree employing ITS alone and in conjunction with ETS. This collective approach facilitated a comprehensive assessment of evolutionary relationships between Allium species. Notably, the ITS region autonomously offered substantial insights and determine the broader relationships among species. The incorporation of the second marker (ETS) not only reinforced the phylogenetic positions of the species but also contributed to enhancing resolution within the subgenus.

One striking characteristic of the ITS data is the strangely large intrageneric genetic distances within Allium. Distances exceeding 40% based on Kimura calculations were identified in research conducted by Friesen et al. (2000) and also in the work by Dubouzet and Shinoda (1999). Such distances often typify the most remotely related genera within subfamilies or even families (Baldwin et al. 1995; Blattner and Kadereit 1999; Hsiao et al. 1999; Noyes and Rieseberg 1998). In stark contrast, intrageneric distances within other plant families predominantly remain below the 10% threshold (Baldwin et al. 1995).

These findings position the Allium as either a remarkably rapid-evolving taxon or one of ancient origins, as molecular evolution hasn’t led to the emergence of comparable numbers of taxonomic categories. Moreover, the outcomes of a phylogenetic analysis showcased that all three cultivars belonging to the same species formed distinct clusters, indicative of their genetic divergence.

The CBOL plant working group (2009) proposed rbcL and matK as the standard barcodes for land plants. This combination embodies a pragmatic resolution to the intricate balance between universality, sequence quality, discrimination, and cost. In their study, utilizing rbcL and matK within the examined sample set yielded a species discrimination success rate of 72%, with the remaining species effectively matched to groups of congeneric species with a 100% success rate. Selecting a plant barcode from the available candidate loci posed a challenging task. Each locus (matK, rbcL, trnH-psbA, and trnL) possessed highly desirable attributes for an effective plant DNA barcoding system. However, none of these four loci completely met all the criteria perfectly.

The phylogenetic analysis of matK region of the present study showing a distinct, comprehensive clustering of BDUT1455 from that of other Allium species. Although, the NJ and MP methods of matk region showing a slightly different patterns of genetic divergence and distinct topologies, relationship among three tested cultivars remained consistent with ITS region analytical results.

Among the realm of plastid genes, rbcL emerges as the most regarded. Enhanced accessibility across various land plants has been achieved through advancements in primer design (Fazekas et al. 2008), rendering it well-suited for generating good quality bidirectional sequences. Regarded as an exemplary multi-locus candidate, rbcL demonstrates excellent performance among the most variable regions, facilitating species discrimination.

Both NJ an MP methods of rbcL region analysis resulted a unified clustering pattern of close relationship among three tested cultivars. Despite research reporting very low divergence for rbcL, especially among closely related species (Newmaster et al. 2008; Liu et al. 2010), Liu et al. (2010) highlighted its potential suitability for Bryophyta barcoding.

Insights from Kress et al. (2005) indicated trnH-psbA spacer ranged from 119 to over 100 bp across the studied Angiosperms. This variability in length could potentially lead to alignment difficulties, a concern that could impede the effectiveness of DNA barcoding due to the substantial number of insertions and deletions within trnH-psbA. For instance, a 94% match between two trnH-psbA sequences in Trigonella foenum-graecum L. might result in intraspecific variation or the misidentification of a single voucher. Consequently, utilizing multiple voucher sequences becomes essential, particularly for similar barcoding regions known to be highly similar (Schori and Schowalter 2011).

As reported by Kress and Erickson (2007), trnH-psbA demonstrated robust amplification across a range of land plants using a single pair of primers, while achieving high levels of species discrimination. Nonetheless, the major obstacle for this locus remains the difficulty in obtaining high-quality bidirectional sequences. The conclusions drawn by Friesen et al. (2000) suggest that Allium is either an exceptionally fast-evolving taxon or one with ancient origins. This implies that molecular evolution within Allium does not necessarily coincide with the emergence of a proportionate increase in taxonomic categories. However, the NJ and MP methods adopted for the trnH-psbA sequence analysis reveals a genetic divergence among three cultivars tested.

Research conducted by Cowan et al. (2006) involving 96 species of Sinningia (Gesneriaceae) showcased a remarkable 95% probability of correct identification through trnS-trnG, trnT-trnL, rpl16, trnL-trnF, arpB-rbcL, and ncpGS markers. The NJ and MP methods of trnL region analysis unveils the genetic divergence among three tested Allium cultivars. It is similar to those of other phylogenetic analyses conducted in this present study.

ISSR markers stand out for effectively studying intraspecific variations, particularly due to their proficiency in detecting even low levels of genetic polymorphism in plants (Zietkiewciz et al. 1994). In comparison, other methods like RAPDs and microsatellite-primed PCR markers tend to yield lower levels of polymorphism, while ISSR markers offer enhanced reliability and reproducibility of bands (Sonnante and Pignone 2001). Consequently, RAPD and various other molecular markers have been employed for studying intraspecific polymorphism, whereas ISSR has shown its proficiency in unravelling interspecific diversity (Nagaoka and Ogihara 1997; Hao et al. 2002; Goldman 2008; Bianco et al. 2011; Poczai et al. 2011; Mukherjee et al. 2013). ISSR analysis using the Structure package exposes clear distinction in the banding pattern. Thus, unveiled a significant level of intraspecific diversity among the various cultivars within A. cepa. This is consistent with the findings of Mukherjee et al. (2013), who reported high levels of polymorphism within A. sativum through ISSR analysis.

Conclusion

In conclusion, the molecular evidence presented in this study emphasizes the presence of significant interspecific diversity and intraspecific divergence within A. cepa. This phenomenon could potentially arise from the distinct characteristics of various cultivation sites, leading to random genetic drift. The employed barcode locus in this research yields substantial insights into A. cepa. To enhance the accuracy of identifying this specific species, it is recommended that future research incorporates a broader range of samples from diverse cultivation sites.