Introduction

Coconut (Cocos nucifera L.) has long been appreciated worldwide as a multi-functional crop with economic, environmental and social significance (Niral and Jerard 2018). In Vietnam, commercialization of fresh coconuts and related products accounts for a significant percentage of household income in productive provinces, particularly in Ben Tre, Tien Giang, and Tra Vinh. Following molecular techniques developed for important crops such as rice (Oryza sativa L.) and corn (Zea mays L.), the coconut genetic diversity has increasingly been studied and characterized (Rajesh et al. 2018). In the past, diversity studies primarily related to the presence of polyphenols, isozymes, carotenols or protein markers. Amongst these early markers, analysis of polyphenols using high-performance liquid chromatography revealed the highest levels of polymorphism (Chempakam and Ratnambal 1993). Since the 1990s, DNA markers have been widely used to analyze coconut genetic diversity, with copious studies using Random Amplification of Polymorphic DNA (RAPD) and Simple Sequence Repeats (SSR) techniques (Ashburner et al. 1997; Everard et al. 1996; Hayati and Hartana 2000; Rivera et al. 1999; Perera et al. 2000; Meerow et al. 2003; Gunn et al. 2011; Loiola et al. 2016; Preethi et al. 2020). Earlier studies used other marker systems including Restriction Fragment Length Polymorphism (RFLP) (Lebrun et al. 1998), Inverse Sequence-Tagged Repeat (ISTR) (Rohde et al. 1996; Duran et al. 1997), Inter Simple Sequence Repeats (ISSR) (Manimekalai et al. 2006), WRKY Loci (Mauro-Herrera et al. 2007; Meerow et al. 2009) and Start Codon Targeted Polymorphism (SCoT) (Rajesh et al. 2015). Major achievements in coconut genetic study have successfully contributed to (i) differentiation based on the level of genetic variation in Tall and Dwarf varieties (Perera et al. 2000), (ii) construction of genetic maps (Rohde et al. 2002; Lebrun et al. 2001), (iii) construction of phylogenetic dendrograms providing evidence for origin and distribution (Upadhya et al. 2004; Dasanayaka et al. 2009), (iv) suggested qualitative trait loci (QTLs) for mite resistance and the pandan aroma (Shalini et al. 2007; Saensuk et al. 2016), and (v) identification of genes involved in somatic embryogenesis, lipid biosynthesis and salt tolerance (Rajesh et al. 2016a; Reynolds et al. 2019; Yang et al. 2021).

Start codon targeted (SCoT) polymorphism is a straightforward and powerful DNA marker system that was first presented by Collard and Mackill (2009). The single, random primers used in PCR reactions are based on the short conserved region surrounding the ATG translation start codon in plant genes on both DNA strands. Point mutations at the primer-binding site or indel mutations within the amplified regions contribute to varying banding patterns that have successfully been used to characterize polymorphism in rice (Collard and Mackill 2009), longan (Chen et al. 2010) and grapes (Zhang et al. 2011). A screening study by Rajesh et al. (2015), involving 25 markers as previously described by Collard and Mackill (2009), found that 15 showed consistent amplification patterns for a set of 23 coconut varieties, and were therefore recommended for further analysis. SCoT markers have also proved useful in identifying true hybrid seedlings in areca nut palms (Areca catechu L.), a member of the same Areca family as coconut. Based on a 1300 bp polymorphic band produced by primer SCoT11, a sequence-characterized amplified region (SCAR) marker, which can distinguish between true Areca nut Tall x Hirehalli Dwarf hybrid seedlings and false seedlings produced due to self-pollination of Hirehalli Dwarf mother palms was developed (Rajesh et al. 2016b). Here, the suitability of 15 SCoT primers was examined for future selective breeding programs in Vietnamese coconuts, while genetic diversity of a subset of varieties conserved at the Dong Go Coconut Research Center (Ben Tre, Vietnam) was also examined.

Materials and methods

Plant materials and DNA isolation

Genetic materials were sourced from 19 coconut varieties, comprising thirteen Dwarf and six Tall varieties (Table 1). Each variety was represented by three different individual palms. Fresh leaves were thoroughly collected from the Dong Go Coconut Research Centre (Giong Trom, Ben tre province, Vietnam). DNA samples were isolated from 57 samples using the extraction procedure of the GeneJet Plant Genome Extraction Kit (Thermo Scientific) as described by the manufacturer. The quality and quantity of DNA were verified using 1.0% agarose gel electrophoresis, a NanoDrop Spectrophotometer and PCR amplification of the plant’s housekeeping gene COX (mitochondrial cytochrome oxidase).

Table 1 List of coconut varieties used in the study

SCoT markers analysis

Three biological replicates were used for each variety to examine and account for the within-population genetic diversity of each variety. All samples were amplified by 15 SCoT primers, previously screened for clarity by Rajesh et al. (2015). Details of the primers, synthesized from Integrated DNA Technologies (Vietnam) are provided in Table 2. The reaction mixture had a total volume of 25 μL and contained 5 μL 5 × MyTaq Buffer, 0.25 μL MyTaq DNA Polymerase (Bioline), 2 μL primer (10 μM), 15.75 μL distilled water and 2 μL template DNA (3 ng/μL). Concentration of MyTaq DNA Polymerase in the reaction mixture was 0.05 units/μL. All PCR amplifications were carried out in a GeneQ™ Thermal Cycler (Bioer, China) with the following conditions: initial denaturation at 94 °C for 3 min, followed by 35 cycles of denaturation at 94 °C for 1 min, annealing for 1 min at variable melting temperature (Tm) for each primer as presented in Table 2 and then 72 °C for 1 min. The final extension at 72 °C lasted 7 min. Three microliters of PCR products from each reaction were mixed with 1.5 μL of 0.6 × GelRed™ (ABT, Vietnam), and separated on 1.5% agarose gel in 0.5 × TBE buffer by electrophoresis at 50 V for 45 min using HyperLadder™ 1 kb (Bioline) as a molecular weight standard. Gels were visualized under UV light in a UVP™ GelDoc-It® imaging system.

Table 2 Sequence and melting temperature of SCoT primers

Data analysis

The gel images were analyzed by CLIQS 1D software, version 1.4 from TotalLab (Newcastle, UK). The bands were scored as either present (1) or absent (0), and their respective molecular weights were calculated in base pairs (bp). Band matching for each gel image was automatically carried out by the program. If the retardation factor (Rf) value of the bands was within 0.015 units of another, a match was declared.

The Polymorphic Information Content (PIC) of each locus was calculated using the formula for dominant markers as \(PIC = 1 - ~\mathop \sum \limits_{i}^{n} f_{i}^{2}\), where \(f_{i}\) is the frequency of the ith allele (De Riek et al. 2001). The Resolving power (Rp) of each locus was calculated according to a previous study as \(Rp = \sum Ib\), where \(Ib = 1 - {\text{(}}2~ \times ~{\text{|}}~0.5 - f_{i} |~)\) (Prevost and Wilkinson 1999). The heterozygosity (He) value of each variety was computed according to Nei’s formula for within population variation index as \(He = 1 - {\text{~}}\mathop \sum \limits_{{i = 1}}^{n} f_{i}^{2}\) (Nei 1978). The unbiased heterozygosity (uHe) value was calculated as \(uHe = \left[ {2N~ \div \left( {2N - 1} \right)} \right]*He\) where N is the number of samples. The Shannon’s Information index was determined using \(I = ~ - \sum f_{{i~}} \times ~\log _{2} (f_{i} )\) (Lewontin 1972). The Shannon’s Information index (uI) value was calculated as \(uI = \left[ {~2N~ \div \left( {2N - 1} \right)} \right]*I\).

Multi-dimensional scaling analysis was undertaken by applying the Principal Coordinate Analysis (PCoA) method to the dissimilarity matrix composed of populations’ pairwise Nei’s genetic distances. Analysis of MOlecular VAriance (AMOVA) was performed with 9999 permutations for all data. Hierarchical cluster analysis was achieved with the application of UPGMA to the matrix of Simple Matching similarity coefficients. Calculations and data visualizations were performed using Microsoft Excel (2010), Genalex 6.5 (Peakall and Smouse 2006, 2012), PopGene 1.32 (Yeh et al. 1997) and NTSYS-pc 2.1 (Rohlf et al. 2002).

Results

SCoT polymorphism

A total of 3774 amplicons were obtained, with ca. 251.6 bands per primer. The length of the bands ranged from 150 to 8500 bp. By matching molecular weights of the amplicons between corresponding replicates, we discovered 306 unique loci with an average of 20.4 unique resolvable bands amplified per primer. The number of loci with reproducible and polymorphic bands was 180, with an average of 9.5 bands per primer, ranging from 5 bands (SCOT19) to 17 bands (SCOT13). The Rp values ranged between 1.895 (SCOT04) and 9.263 (SCOT13), with an average of 5.733. Total Polymorphic Information Content (PIC) ranged from 1.518 (SCOT04) to 6.039 (SCOT13), with an average of 3.849. Average PIC over loci (PICave) ranged from 0.217 (SCOT04) to 0.390 (SCOT18) with an average of 0.316. Table 3 presents all data regarding number of amplicons and loci, Rp and PIC values for each SCoT primer. PCR amplification profiles, produced with two representative SCOT primers (SCOT04 and SCOT15), were shown in Fig. 5.

Table 3 Primer information, scored bands and discriminating power statistics

Genetic diversity analysis among coconut varieties

Based on allele distribution statistics presented in Table 4, the frequency of all reproducible polymorphic loci was greater than 5%, suggesting that no band was unique to a single variety. Numbers of bands were higher in XFD (158), AROD (151), XLGD (150) while MRD (29), EGD (37), and SLRD (45) were recorded the lowest number of bands. Furthermore, the number of bands presented at less than or equal to 25% of the population was observed in XFD (11) and in MYD, MRD, SLRD and TQYD (0). The number of less common bands presented at less than or equal to 50% of the populations was reported in XFD (50) and MRD (2). Likewise, the mean unbiased expected heterozygosity (uHe) ranged from 0.058 (MRD) to 0.410 (XFD), with an average of 0.237. The mean unbiased Shannon’s Information index ranged between 0.092 (MRD) and 0.592 (XFD), with an average of 0.352. Applying the two-tailed t-test and assuming unequal variances, the Nei’s genetic diversity index (uHe) of Vietnamese Dwarfs (M = 0.319, SD = 0.066) was higher than the non-native Dwarf varieties (M = 0.260, SD = 0.086), t(11) = 5.916, p < 0.001. AMOVA analysis presented in Table 5 indicated significant variation within varieties (90.57%, ΦPT = 0.094, p = 0.009).

Table 4 List of allele distribution statistics for 19 coconut varieties
Table 5 Analysis of Molecular Variance (AMOVA) for 19 coconut varieties

Multi-dimensional scaling analysis

The first three axes of the PCoA plot derived from Nei’s genetic distances represented 34.96%, 10.45% and 7.81% of the cumulative variation among varieties. The 19 coconut varieties were separated into four distinct groups, as shown in Fig. 1. The clearest association was seen among six non-native Dwarf varieties—one Sri Lankan (SLRD), two African (GYD, EGD), two Malaysian Dwarfs (MYD, MRD) and one Filipino Dwarf (CAT). Four Vietnamese varieties, including three Dwarfs and one hybrid (XLGD, AROD, XGD and DAURT) were also clustered together. The third group was made up of four Tall varieties (WAT, BAOT, TAAT01 and MACT02) and one Dwarf (EOD). Two Dwarfs from Vietnam primarily harvested for drinking water (SOCD and TQYD) exhibited a close relationship, according to this analytical tool. As data showed, XFD and SNRT showed no meaningful association with any group.

Fig. 1
figure 1

Three-dimensional representation of Principal Coordinate Analysis (PCoA) of pairwise Nei’s genetic distances obtained by Genalex 6.51 software

Hierarchical clustering analysis

The average pairwise Simple Matching (SM) coefficient was 0.611. The pairwise Simple Matching (SM) coefficient was lowest in two Dwarf varieties from Sri Lanka (SLRD) and Vietnam (XFD) at 0.411, and highest in two Dwarf varieties from Sri Lanka (SLRD) and Malaysia (MRD) at 0.956. The Vietnamese XFD showed the lowest coefficients against all foreign Dwarf varieties. Bootstrap values for branches on the dendrogram constructed from UPGMA analysis ranged between 26 and 90%, as shown in Fig. 2. Two main clusters were identified, designated as ‘A’ and ‘B’ diverging at the similarity coefficient of 0.550, meaning that they shared 55% of the amplified bands. Each cluster contained two clades as A.1, A.2 and B.1, B.2. Cluster A was characterized by five varieties (DAURT, SOCD, XLGD, AROD and XFD) that all had small fruit sizes and high sugar levels of 7.5–8.5°Bx.. Within Cluster A, Clade A.1 contained one hybrid (DAURT) and two Dwarf (SOCD, XLGD) varieties from Vietnam, while Clade A.2 contained two Dwarf varieties from Vietnam (AROD and XFD). Cluster B contained both Tall and Dwarf varieties. Sub-cluster B.1 contained two clades with the highest similarity values (Clade B.1.1 and Clade B.1.2), while Clade B.1.1 contained six Dwarf varieties from Equatorial Guinea, Malaysia, Sri Lanka, the Philippines, and Vietnam (EGD, MYD, MRD, SLRD, CAT, and EOD). These varieties produced coconut water at medium quantity (133.3–220 g per nut), with sugar level of 6.75°Bx (data not shown). Clade B.1.2 contained three Tall varieties from Vietnam and West Africa (MACT02, WAT and TAAT01. These all had oil content in the upper-medium range (ca. 65% of dry weight) (data not shown). Sub-cluster B.2 contained a Vietnamese Dwarf variety (TQYD) and two Tall varieties from the Philippines (SNRT and BAOT). These varieties were less productive in fruit yield (80–90 fruits/tree/year).

Fig. 2
figure 2

Dendrogram generated from the UPGMA clustering method using Simple Matching coefficients of 19 coconut varieties based on SCoT analysis. Numbers on the branches represent percentage bootstrap support based on 1000 iterations

Discussion

The SCoT system was designed with longer primers, modeled after short conserved regions flanking ATG translation start codon in plants. This characteristic confers higher reproducibility, with a greater possibility of amplifying meaningful regions of the genome compared to RAPD markers (Collard and Mackill 2009). This study suggested that SCoT primers can detect a high level of genetic diversity among coconut varieties, with an average PIC value of 0.316 ± 0.051 comparable to that obtained by SSR and RAPD primers (Rivera et al. 1999; Meerow et al. 2003; Rajesh et al. 2014). Thus, in Fig. 3, by plotting a scatter graph of markers based on their PIC values and number of reproducible unique loci, we identified a group of markers that performed well on both measures (SCOT18, SCOT13, SCOT15, SCOT21, SCOT22 and SCOT23) in contrast to another group with low values for both measures (SCOT04 and SCOT25). The primers of the first group should therefore be prioritized in future coconut studies.

Fig. 3
figure 3

Scatter plot of 15 SCoT primers based on values for PIC (Polymorphic Information Content) and Nr (Number of reproducible polymorphic bands)

The allele distribution revealed by SCoT primer amplification indicated a high degree of diversity in the seven Dwarf varieties originating from Vietnam (SOCD, EOD, TQYD, XGD, XFD, AROD and XLGD) compared to a relatively high degree of uniformity within each of the six non-native Dwarf varieties (GYD, EGD, CAT, MYD, MRD and SLRD), as shown in Fig. 4. The Nei’s genetic diversity index (uHe) of Vietnamese Dwarfs was higher than the non-native Dwarf varieties. Applying the two-tailed t-test and assuming unequal variances, the Nei’s genetic diversity index (uHe) of Vietnamese Dwarfs (M = 0.319, SD = 0.066) was higher than the non-native Dwarf varieties (M = 0.260, SD = 0.086), t(11) = 5.916, p < 0.001. The Nei’s genetic diversity index measures the probability that two individuals, chosen at random from the population, do not exhibit the same allele for the marker being studied. Therefore, our results suggested that within-population diversity for Vietnamese Dwarf varieties was high (Fig. 5).

Fig. 4
figure 4

Mean Nei’s genetic diversity index (uHe) with standard error for foreign Dwarf (FD), Vietnamese Dwarf (VD), foreign Tall (FT), Vietnamese Tall (VT), all Dwarf (AD) and all Tall (AT) coconut varieties

Fig. 5
figure 5

Amplification profile of some SCOT primers in coconuts varieties used in this study a PCR amplification induced by SCOT04 primer, b PCR amplification induced by SCOT15 primer

Previous genetic marker studies on coconut revealed that Dwarf varieties tend to be more homozygous by virtue of their self-pollinating nature, resulting in less variability (Perera et al. 1998, 2000). The sampling procedure for this study was prudently carried out in the most established germplasm collection in Vietnam. Thus, the high within-population diversity for Vietnamese Dwarf varieties revealed above suggests that traditional farming practices in Vietnam have greatly encouraged cross-fertilization. Unlike other regions where monoculture coconut plantations are well-established, most coconut palms in Ben Tre, Vietnam are planted as mixtures of varieties as small enclosures with high density, often intercropped with several fruits and vegetables. The high density of palms coupled with the great diversity of pollinating insects might have contributed to cross-pollination even in traditionally self-pollinating Dwarfs. Dong Go Coconut Research Center selected native mother palms from surrounding small farms. Thus, results presented here most likely reflect the genetic diversity present in the Ben Tre region.

This high level of genetic diversity in Vietnamese Dwarf varieties has not been observed in previous studies. Compared to other regions such as China, America and Africa, Southeast Asia is regarded as the center of domestication of Dwarf coconuts. Hence, coconuts from this region tend to have a greater genetic diversity (Perera et al. 2000; Xiao et al. 2013). Nevertheless, in most studies that included Vietnamese Dwarfs, the genetic diversity of the Vietnamese group was not significantly different compared to populations from other Southeast Asian countries (Perera et al. 2003; Geethanjali et al. 2018). The allele distribution revealed by this study suggests that a closer examination of the genetic material of Vietnamese Dwarfs will be of interest, especially since greater diversity allows for better and more sustainable resistance to diseases such as lethal yellowing (Baudouin et al. 2009).

Shannon’s Information index was higher in Vietnamese Dwarf varieties (M = 0.471, SD = 0.066) than in non-native Dwarf varieties (M = 0.190, SD = 0.052), t(11) = 6.107, p < 0.001. Shannon’s index is a statistical measure of entropy, based on the weighted geometric mean of the frequency of each allele (Hennink and Zeven 1990; Spellerberg and Fedor 2003). If a marker has a greater number of alleles that are more evenly distributed across populations, this will result in a higher Shannon’s index value. The lower number of alleles in non-native varieties might be due to the small number of qualified foreign mother palms conserved at Dong Go Coconut Research Center, leading to higher uniformity in these sample populations.

Among the exotic varieties, the Nei’s genetic diversity index value for the Talls (M = 0.249, SD = 0.072) was not significantly higher than for the Dwarfs (M = 0.125, SD = 0.051), t(3) = 2.662, p = 0.076. Previous studies suggested that the greater diversity in Tall varieties was due to allogamous breeding behaviors but the limited number of varieties in this study precludes a statistically significant conclusion (Rivera et al. 1999; Teulat et al. 2000). There was also no significant difference in Nei’s index between the seven Vietnamese Dwarfs (M = 0.319, SD = 0.066) and the three Vietnamese Tall and hybrid varieties (M = 0.260, SD = 0.086), t(3) = 1.063, p = 0.366, or between all Dwarf varieties (M = 0.230, SD = 0.116) and all Tall varieties (M = 0.255, SD = 0.071), t(15) = 0.575, p = 0.574. This result further underscored the high degree of genetic diversity in Vietnamese Dwarf populations compared to previously studied populations.

Across all varieties, the moderate ΦPT value (9.41%) obtained from the AMOVA analysis indicated relatively little differentiation among coconut varieties. ΦPT is an analog of Wright’s fixation index (Fst). The value decreases with a greater degree of gene flow. A high value of ΦPT suggests significant barriers to inter-varietal breeding and, hence, high inter-varietal variance. By contrast, a lower value suggests the reverse, i.e. varieties are more similar to one another. Previous studies of coconut indicated that the degree of within-variety population was higher than among-variety, ranging from 53% in Sri Lankan varieties (Perera et al. 1998), 60% in South Pacific varieties (Ashburner et al. 1997), 67% in Mexican varieties (Zizumbo-Villarreal et al. 2006) and up to 98% in Kenyan varieties (Oyoo and Muhammed 2016). The within-population variation partition of coconut varieties conserved in Dong Go Coconut Research Center was toward the higher end of the spectrum at 90.59%. This is primarily due to the greatly shared diversity among aforementioned Vietnamese Dwarf varieties, which in turn suggests a degree of introgression, especially among the Xiem family (XFD, XLGD, XGD and AROD).

Both the PCoA plot and UPGMA dendrogram delineated six non-native Dwarf varieties as a separate group. Both methods also indicated that the Vietnamese Dwarf varieties were similar to one another, and that the Tall varieties could be separated into distinct clusters (WAT, SNRT, BAOT, TAAT01 and MACT02). The Vietnamese hybrid DAURT was suggested by both methods to share more alleles with the Dwarf variety of XLGD. This result indicated that SCoT primers can provide useful information about the type and origin of coconut varieties, as suggested by the pioneering study on SCoT markers in coconut (Rajesh et al. 2015).

In conclusion, this study is one of the first to characterize Vietnamese coconut varieties using genetic markers. The high level of within-population genetic diversity revealed in Vietnamese coconuts, especially among the Dwarf varieties, suggested a higher degree of inter-breeding in Vietnamese Dwarfs compared to foreign Dwarf populations. This underscored the influence of traditional farming practice at small householder farms, where mixtures of coconut varieties are planted at high density among intercrops that attract pollinating insects. Future breeding efforts to improve yield and disease-resistance traits can benefit from further investigation of such diversity. SCoT markers with higher differentiation power were also identified as relatively able to demonstrate relationship among coconut varieties with Vietnamese origin or foreign origin, and likewise demonstrate relationship among Dwarfs and among Talls. The SCoT-PCR system thus proved useful in the exploratory assessment of diversity in large coconut germplasm collections.