Theobroma cacao L. is unique among the 22 species of its genus in that it is commercially exploited on a large scale for the production of cacao beans. Cocoa butter and powder, which are extracted from the fermented and dried cocoa beans, are the main ingredients used for the commercial manufacture of chocolate. Improved cocoa cropping requires the development of genetic materials that have a higher productivity and an increased resistance as well as cacao beans of good industrial quality (Figueira and Cascardo 2001). To obtain such characteristics, plant geneticists have made abundant use of molecular markers, especially microsatellites. Microsatellites are powerful genetic markers due to several characteristics, including their abundance in eukaryotic genomes, high levels of polymorphism, Mendelian inheritance, co-dominance and locus-specificity (Merdinoglu et al. 2005). In cocoa, microsatellites have been applied in studies of DNA fingerprint, genetic diversity, variety characterization and genetic mapping (Charters and Wilkinson 2000; Faleiro et al. 2004; Pugh et al. 2004; Saunders et al. 2004).

In 1999, Lanaud et al. developed the first group of simple sequence repeat (SSR) markers for T. cacao. More recently, Pugh et al. (2004) developed 387 new SSR markers for this species. However, all of these SSR loci were isolated using dinucleotide probes during the screening of the genomic library. Consequently, SSR loci consisting of repeats of tri- and tetra-nucleotides remain to be searched in the T. cacao genome. The aim of the present study was to develop a new group of SSR markers, including tri- and tetra-nucleotide repeats. To this end, we have attempted to construct 13 genomic libraries, enriched for different SSR sequences, and subsequently to use these libraries to identify and characterize the new microsatellites.

Genomic DNA was extracted from leaves of the T. cacao (Scavina-6 clone) following the method of Faleiro et al. (2002). The DNA (1.0 μg) was first digested with the AluI and HaeIII restriction enzymes (New England Biolabs. Beverly, Mass.) at 37°C for 4 h. The DNA fragments, ranging in size from 250 to 750 bp, were then electrophoresed on a 0.8% agarose gel, excised from the gel and purified with the Wizard SV Gel kit (Promega, Madison, Wis.). The product was linked to the “Hae adapter” (formed by the association of the oligonucleotides Hae1: 5′-CCATCCGCGGCTAGCAGCATAAAA-3′ and Hae2: 5′-ATGCTGCTAGCCGCGGATGG-3′) in the presence of T4 DNA ligase (Promega). Following ligation, 1-μl aliquots were amplified by PCR in the presence of the Hae2 oligonucleotide. The amplified samples were used to construct 13 different SSR-enriched libraries using hybridization capture method. After melting at 94°C for 5 min, each sample was incubated with one of 13 distinct biotin-labeled SSR oligonucleotides: (AC)10, (AT)12, (GA)13, (GC)8, (GT)13, (TC)10, (CAA)7, (TAT)7, (AAT)7, (ATT)7, (GATA)5, (ACAG)5 or (GACA)5. Fragments containing SSR regions were captured with streptavidin-conjugated magnetic particles (Streptavidin MagneSphere Paramagnetic Particles; Promega). After elution, the enriched fragments were amplified by PCR with the primer Hae2. The products were cloned in the pBluescriptKS plasmid (Stratagene, La Jolla, Calif.). Among the 1536 candidate clones, 556 were picked and submitted to fluorescent DNA sequencing, of which 222 were found to be positive for the presence of microsatellites. Among these sequences, 123 were selected for primer design with the PRIMER 3 software (http://frodo.wi.mit.edu/cgi-bin/promer3primer3.bin).

Among the SSRs selected, 64 consisted of sequences containing di-nucleotides, 45 contained tri-nucleotides and 14 contained tetra-nucleotides. The number of replications varied from four (AG, AT, CG, CT, GA, TC, TG, AGG, CAA, CAG, CTG, TGC, TTG and GACA) to 46 (AG). The CAA trinucleotide was the most abundant repetition found in the T. cacao L. genome. None of the 123 new regions was homologous to cocoa SSR sequences previously deposited in the GenBank sequence database.

The developed primers were used in PCR reactions with DNA samples from five cocoa accessions (EEG29, EET 397, CCN 10, RB 39 and CAB 169), one accession of T. grandiflorum L. and one accession of Herranea sp/ (a genus genetically related to Theobroma) obtained from the Germplasm Collection of the Cocoa Research Center ICEPEC), Ilhéus, Bahia, Brazil. The reactions were performed in a 20-μl final volume containing 10 ng DNA, 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 2 mM MgCl2, 100 μM each desoxyribonucleotide (dATP, dTTP, dGTP and dCTP), 0.2 μM of each primer (forward and reverse) and 1.0 U Taq DNA polymerase (Promega). The amplifications were carried out in a Mastercycler gradient cycler (Eppendorf) according to the following program: 94°C for 4 min, followed by 32 cycles at 94°C for 30 s, T a of each primer (see Table 1) for 1 min, 72°C for 1 min, and a final extension at 72°C for 7 m. Following amplification, the samples were electrophoresed on a high-resolution agarose gel (4%) (Sigma, St. Louis, Mo.), and the allele sizes were estimated by comparison with a 10-bp standard molecular weight (Sigma). The results revealed that 54 SSR were polymorphic, 61 (49%) were monomorphic and eight (8%) did not produce any amplification product. Of the 115 microsatellites that had produced amplification product, 76 were positive for T. grandiflorum and Herranea sp. and 16 were exclusive for cacao. These new markers will be useful in breeding programs for cocoa involving marker-assisted selection and diversity genetics studies and have already been used for genome mapping.

Table 1 Genetic characterization of microsatellite loci isolated from Theobroma cacao L.