Introduction

Theobroma cacao L. is the most important species of the Theobroma genus. This species belongs to the Malvaceae family. It is a diploid plant (2n = 2x = 20 chromosomes) with a genome size ranging from 411 to 494 Mb (Argout et al. 2011). It is cultivated mainly for its seeds, which, after fermentation and drying, yield merchantable cocoa, a raw material used in chocolate, food, cosmetics and pharmacology (Bruno 2003). Côte d'Ivoire is the world’s leading producer of merchantable cocoa, with an annual production of approximately 2,200,000 tonnes (ICCO 2021), or 35% of the world supply. Cocoa farming plays an important role in the Ivorian economy, contributing 15% of gross domestic product (GDP) and 40% of export earnings.

However, cocoa farming faces a number of constraints potentially compromising its sustainability. These include the low level of use of improved planting material (Assiri et al. 2009), high pest pressure and climate change.

In Côte d'Ivoire, only improved planting material hybrids are distributed to farmers. Clonal material is used only at the research station as progenitors for hybrid production. Under these conditions, it is necessary to assess the legitimacy of hybrids tested at experimental sites and distributed to farmers via seed gardens. This study, which focuses on crosses made at the CNRA research station in Divo (Côte d'Ivoire), aims to verify the legitimacy of crosses involved in the production of cocoa hybrid families.

Materials and methods

The study was carried out at the CNRA (National Agronomic Research Center), Divo station (5°50′27.8″N; 5°21′ 30.1″W) in Côte d'Ivoire. The station is located approximately 17 km from Divo in the Lôh Djiboua region. The annual rainfall at the site ranges from 1200 mm to 1400 mm/year. The average temperature is 27 °C, and the relative humidity is 85% (Ehounou et al. 2019).

Plant material

The plant material is composed of 13 hybrid families resulting from crosses between 12 cocoa clones. These families, resulting from controlled manual pollination, were planted at the Divo research station in a control plot for the evaluation of drought resilience (Table 1).

Table 1 List of hybrid families and clones

Methods

Collection of leaf samples

A total of 57 leaf samples were collected and sent for genotyping to the Chaâbouni Genomics Laboratory (LGC) in England. These samples were taken from three trees per hybrid family and one tree per clone. Eight leaf discs were taken from green leaves of each genotype. These discs were inserted into sampling kits provided by the Chaâbouni Genomics Laboratory (LGC) in England. The kits containing the leaf samples were then sent for genotyping.

Genotyping

Genomic DNA was extracted at the Chaâbouni Genomics Laboratory (LGC) in England following a standardized mixed alkyl trimethyl ammonium bromide (MATAB) protocol as described by Pokou et al. (2009). Genotyping was carried out at LGC Genomics via 99 polymorphic single nucleotide polymorphisms (SNPs) identified on the 10 chromosomes of the cocoa genome (Argout et al. 2008).

Statistical analysis of the genotyping data

The legitimacy of the offspring from single crosses was studied by analysing genetic diversity within and between populations. Genetic diversity parameters were calculated via Genetix V4.05 (Belkhir et al. 2004), Fstat v2.9.3.2 (Goudet 1995) and GenAlEx v6.1 (Peakall and Smouse 2012). In addition, an analysis of molecular variance (AMOVA) was also performed via GenAlEx v6.5 software (Peakall and Smouse 2012). For each genotype, several parameters were determined. These indices are the number of alleles per locus (A), the percentage of polymorphic loci (P) at the 95% and 99% thresholds, observed (Ho) and expected (He) heterozygosity, total genetic diversity or total heterozygosity (Ht), the genetic differentiation index (GST), heterozygote deficit within populations (FIS), and interpopulation genetic diversity (DST). Ht was estimated as the sum of the intrapopulation genetic diversity (HS) and interpopulation genetic diversity (DST).

GST represents the fraction of total genetic diversity corresponding to the genetic difference between populations. This parameter was measured using the formula GST = DST/ Ht (Nei 1978). Intrapopulation heterozygote deficit and Fst [which estimates the proportion of interpopulation genetic diversity to total diversity (Weir and Cockerham 1984)] were also determined. The Mann‒Whitney U test was performed via Statistica 7.1 software.

An analysis of molecular variance (AMOVA) was performed to estimate the distribution of diversity within families and parent clones to quantify genetic variation between individuals. AMOVA was also used to determine the FST (F statistic), which estimates the proportion of interpopulation genetic diversity to total diversity (Weir and Cockerham 1984). A principal coordinate analysis (PCoA) was carried out on 51 individuals to graphically represent each family and clone in a two-dimensional plane. This analysis is based on the dissimilarity matrix between populations. All these analyses were carried out via GenAlEx 6.5 software (Peakall and Smouse 2012).

A phylogenetic analysis was carried out using DARwin v.5 software (Perrier and Jacquemoud-Collet 2006). This analysis involved producing a dendrogram via a Bayesian algorithm that is based on sequential clustering of neighbouring genotypes. The robustness of the tree nodes was tested by applying 1000 bootstraps on the basis of individual repeatability.

Single nucleotide polymorphism (SNP) marker data were subjected to genetic structure assessment of hybrid families and parental clones using STRUCTURE 2.3.4 software (Porras-Hurtado et al. 2013) to assign each individual to a genetic group. The optimal number of K groups was determined via the method of Evanno et al. (2005). It was evaluated on the Structure Harvester platform (http://taylor0.biology.ucla.edu/structureHarvester/) (Earl and von Holdt 2012) by applying the admixture model (Lawson et al. 2018). In this analysis, K varied from 1 to 5, with five interactions in the execution of the analysis program. Among the five interactions, those with the highest Ln Pr (X/K) values were selected and represented as a coloured bar chart (Takrama et al. 2014). Next, an analysis probabilistically assigned each individual to a group using a Bayesian algorithm.

The membership coefficient (Q value), which varies between 0 and 1, was used to designate the membership of a family or clone to a genetic group. If (Q value < 0.80) for a specific group, then the family or clone was considered a hybrid, and if (Q value ≥ 0.80) for a specific group, then the individual was considered a parental clone (Li et al. 2021).

Results

Marker polymorphisms

The percentage or rate of marker polymorphisms (P = 95%) was 100% for the parent clones. In the progenies, the percentage of marker polymorphisms ranged from 57.14 (UPA402 × UF676) to 92.66% (UPA409 × IFC1), with an average of 77.29% (Table 2).

Table 2 Percentage of polymorphism in each population

Genetic diversity of parent clones and hybrid families

The average number of alleles per locus (A) was 2, and the effective number of alleles (Ae) obtained between parent clones was 1.69. The Shannon diversity index value was 0.57. The expected heterozygosity (He = 0.39) was greater than the observed heterozygosity (Ho = 0.33). The value of the fixation index (Fis) was 0.18.

The average number of effective alleles (Ae) ranged from 1.57 (UPA402 × UF676) to 1.92 (UPA409 × IFC1), with an average of 1.77. The Shannon diversity index (I) was high for the (UPA608 × IFC412) family at 0.54. The expected heterozygosity was lower than the observed heterozygosity (Ho = 0.42 and He = 0.30). The fixation index (Fis) ranged from − 0.52 (UPA402 × UF676) to − 0.16 (POR × T50/501), with an average of − 0.34 (Table 3).

Table 3 Values of genetic diversity parameters assessed within hybrid families and clones

Interpopulation genetic diversity

Molecular analysis of variance revealed nonsignificant genetic differentiation between parents and families (P = 0.323). This analysis revealed that 1% of the total variance was attributed to interpopulation variance and that 99% was attributed to intrapopulation variance (Table 4).

Table 4 Distribution of genetic variation according to AMOVA for hybrid families and parents

Differentiation between hybrid families and parental clones

Fixation index values between populations (Fis = − 0.29) and for all populations (Fit = − 0.13) indicate excess heterozygosity. Moderate genetic differentiation between the clones and their progeny (FST = 0.12) was observed. The value of Nm gene flow per population was 2.8 (Table 5).

Table 5 Mean values of genetic diversity parameters assessed within populations

Genetic differentiation coefficients (FST) and genetic distances (D) were calculated for each pair of hybrid families and clones. Analysis of the two matrices revealed significant differentiation (FST) and genetic distances (D) between populations.

Table 6 presents the genetic differentiation coefficients calculated between families and parents. FST ranged from 0.035 (between F13 and F15) to 0.145 (between F1 and F3), whereas the genetic distances between families varied from 0.022 (between F13 and F15) to 0.153 (between F1 and F3) (Table 7).

Table 6 Matrix of FST genetic differentiation coefficients calculated between families and parents
Table 7 Matrix of genetic distances (D) calculated between families and parents

Principal coordinate analysis (PCoA)

The distributions of parent clones (purple) and progeny (black) along Axes 1 and 2 of the principal component analysis (PCA) are shown in Fig. 1. These two axes contribute 37.29% of the total variability. The distribution of individuals in the 1 and 2 factorial planes reveals two main groups along Axis 1. The first group (I) comprises parent clones SCA6, POR, ICS46, UF667, MOQ 413, and ICS1 and their progeny. Group II comprises parents IMC67, T85/799, PA150, UPA402, UPA409, and IFC 5 and their progeny. Axis 2 structures each group (I and II) into four subgroups (I1, I2, II1, II2) according to the proximity of descendants and parent clones. Subgroup I1 comprises descendants close to the SCA6 parent. The remaining parents and descendants of Group I are grouped together in subgroup I2. Subgroup II1 includes controls IMC67, UPA409, UPA402, T85/799, and PA150 and their descendants. Subgroup II2 includes the parent clone IFC5 and its close descendants.

Fig. 1
figure 1

Projection of Theobroma cacao hybrid families and parent clones in plane 1–2 of the Principal Coordinate Analysis (PCoA)

Phylogenetic relationships

The hierarchical ascending classification (HAC) performed on the basis of the averages of descriptive characters produces a dendrogram showing two sets of molecular diversity. An examination of the phylogenetic tree revealed 2 major genetic groups whose branches were supported by probabilities ranging from 3 to 98%.

The first group comprised six (red) clones (UF676, ICS46, POR, ICS1, MOQ413 and SCA6) and the following 6 progeny: (UPA 402 × UF676); (UPA 409 × POR); (POR × T60/887); (SCA6 × ICS1); (MOQ 413 × SCA6); and (IFC 720 × ICS 46).

The second group was made up of six (red) progenitors (IFC5, IMC67, UPA402, T85/799 and PA150) and the following 8 offspring: (UPA608 X IFC412), (UPA409 X IFC1), (IMC67 X IFC1), (T85/799 X IFC15), (T79/501 X IFC5), and (PA150 X POR) (UPA603 X UF667) (Fig. 2).

Fig. 2
figure 2

Dendrogram presenting the hierarchical ascending classification (HAC) of hybrid families and parents of cocoa trees using 84 single nucleotide polymorphism (SNP) markers

Genetic structuring of parental clones and families

The results of the Bayesian analyses used to determine the number of clusters within parents and descendants are shown in Fig. 3. Analysis of the figure revealed the existence of two major genetic groups (clusters) (K = 2).

Fig. 3
figure 3

Graph showing the values of the Delta K statistic, allowing 2 to be considered the optimum number

Figure 4 shows the genetic structure of the two populations (parents and offspring). It shows the contribution of each cluster to the constitution of the individuals. These clusters are associated with several control genotypes. Cluster 1 (red) is associated with the parent clones POR, SCA6, ICS46, ICS1 and UF667. Cluster 2 (green) includes the parent clones (UPA 402, UPA409, IMC 67, T85/799, IFC5, and PA150). Analysis of the figure revealed that 16 hybrids had a strong genetic contribution from Cluster 1 (red). On the other hand, 23 hybrid offspring strongly contributed to the genetic constitution of Cluster 2 (green).

Fig. 4
figure 4

Genetic structure of hybrid families and parent clones of cocoa trees by the Bayesian method using 84 single nucleotide polymorphism (SNP) loci. Each color represents a genetic group, and each bar is a clone or hybrid family with the probability of belonging to a genetic group ranging from 0 to 1 (Q value)

Discussion

The study of the legitimacy of a few hybrid families resulting from simple crosses is an approach that can be used to ensure conformity between parents and progenies resulting from manual pollination. In this study, 84 informative SNP markers were used to assess the genetic conformity of 39 offspring from 12 parent clones. Indeed, of the more than 1,000 SNP markers identified on chromosomes 1 to 10 of the cocoa genome by Argout et al. (2008), only 99 are commonly used for genotyping research because of their high level of polymorphism and discriminatory power.

The results of our study confirmed the polymorphism of the markers used. Indeed, the SNP markers were polymorphic for the entirety of the two populations studied (parents and descendants). Our results are in line with those of Ji et al. (2013), whose work observed marker polymorphisms by assessing the genetic diversity and relatedness of cocoa varieties from Honduras and Nicaragua via 70 SNP markers.

In addition, nonsignificant genetic differentiation between parents and offspring (P = 0.323) was revealed. This finding indicates that there is significant similarity between parent clones and their offspring. Similar results (P = 0.285) were reported in a study of genetic diversity and relatedness between cocoa trees from Bogua and Utcubamba in northern Peru, in which 192 SNP markers were used (Danilo et al. 2022).

The average number of alleles per locus ranged from 1.57 to 1.92, with an average of 1.77 in the offspring and 2 in the parents. The results revealed high allelic richness in both populations. Numerous investigations on the diversity of T. cacao have also indicated significant allelic richness in this species. Indeed, diversity research on cocoa populations carried out by Danilo et al. (2022) reported averages of 1.52 and 1.59 alleles per population, respectively.

This allelic richness could also be attributed to the gametosporophytic self-incompatibility system characteristic of T. cacao (Royaert et al. 2011; Lanaud et al. 2017). This high allelic richness is also an asset for conservation strategies (Bataillon et al. 1996).

Our work also revealed a high contribution of intrapopulation genetic diversity (Hs = 99%) to total genetic diversity (Ht = 100%), with an interpopulation diversity of 1%. Because of their high diversity, parents and offspring from single crosses could constitute a valuable reservoir of genetic resources for many selection criteria (Ouédraogo et al. 2005). As a result, this plant material can be used in breeding programs aimed at genetic improvement of the cocoa tree for resistance to induced water deficit, for example. Indeed, the high diversity observed within these two populations could facilitate their integration within the recurrent and reciprocal selection program currently underway in Côte d'Ivoire (Pokou et al. 2009).

The expected heterozygosity (He) was 0.30 for the progeny population and 0.39 for the parental population. These values are close to those obtained by Gopaulchan et al. (2020) (He = 0.32), whose work focused on the genetic diversity of cocoa (Theobroma cacao L.) in Dominica via 180 SNP markers. On the other hand, these values are higher than the results from Padi et al. (2015) (He = 0.24), whose work on the utility of SNP fingerprints from 64 loci examined the diversity, mislabelling and parentage of 2,551 trees from six seed fields and hybrid progeny plots and farmer accessions in Ghana.

The fixation index values ranged from − 0.52 to − 0.16 for the hybrid progenies and 0.18 for the parental clones. These values are less than unity, indicating excess heterozygosity in these populations (Wright 1965).

Low genetic differentiation (FST < 0.05) and genetic distances (0.03 < D < 0.09) were observed between hybrid families F2, F8, F10, F11, F12, F13 and F14 and their respective parents. Thus, the FST values and genetic distances (D) confirmed the proximity between these clones and their respective progenies. These results are similar to those of Jaime et al. (2017), who reported low genetic distances (0.06 < D < 0.07) between cocoa trees from Chucho and Beni in Colombia, indicating strong similarity between these two populations.

In addition, principal component analysis (PCA) and hierarchical ascending classification (HAC) revealed clustering of the parental clones SCA6, PA150, POR, IFC5 and IMC67 with their progenies. The clones used in this trial would therefore be genetically close to their progeny. This proximity can be explained by the closely related genetic basis of the parental material. Similar results were reported by Koffi et al. (2022) in coconut palms. These authors explained that genetic differences or similarities in coconut trees (parents and offspring) are favoured by the contribution of genes from the male progenitor.

STRUCTURE analysis of hybrid families and clones revealed that families F9 (PA150 X IFC5), F15 (PA150 X POR), F12 (IMC67 X IFC1), F14 (POR X T50/501), F11 (IFC720 X ICS46) and F8 (SCA6 X ICS1), and F13 (MOQ413 X SCA6) have membership rates (Q values) below 0.80 with the Maranon (PA150), Iquitos (IMC67), Criollo (POR and ICS46) and Contamana (SCA6) genetic groups. Membership rates less than 0.80 indicate that these individuals are offspring of different genetic groups. Membership rates less than 0.80 were reported in a study comparing traditional Madagascar varieties with their progenitors (Li et al. 2021). In this study, varieties with Q values less than 0.80 were considered to have a strong resemblance to their progenitors.

These results indicate that the seed production technique in cocoa seed fields adopted by the CNRA is reliable in guaranteeing the expected performance of these hybrids. This work highlights the great level of satisfaction expressed by cocoa growers regarding the performance of hybrids distributed through CNRA seed fields.

Conclusion

The data obtained in this work, aiming at the legitimacy of hybrids, revealed little differentiation between the hybrids and the parents for seven hybrid families in addition to a high degree of allelic richness.

Therefore, these results constitute an asset for the large-scale dissemination of hybrid families presenting traits of agronomic and/or technological interest and for the conservation of these populations.