Introduction

Diverse sectors of the economy have intensified their search for products derived from plants. In the medicinal sector alone, 25% of the global production of medicinal drugs is (directly or indirectly) plant based (Newman and Cragg 2020). The increasing search for alternative products is spurred by rising resistance in pathogenic microorganisms, reducing or nullifying the efficiency of medications (Medeiros et al. 2017). Similarly, in the agricultural sector, the development of resistance in plant pathogens, as well as the concern for food safety through the absence and/or reduction of residues in food, has intensified the search for bioproducts (Chojnacka 2015; Li et al. 2019).

The increase in the use of plants with bioactive properties, together with overexploitation, the pressure of anthropization, and the modification of natural environments, compromises the sustainable use and conservation of these species (Costa et al. 2015; Chen et al. 2016; Kadam and Pawar 2020; Silva-Júnior et al. 2020). Under these conditions, they are subject to reduction in their genetic base, to genetic imbalance of their populations, and to variations in effective population size (Santos et al. 2017), making adoption of conservation strategies essential. The implementation of Active Germplasm Banks (AGBs), conservation units of genetic material, is an alternative for the conservation of the genetic variability of a species outside of its natural location of occurrence (Engel and Visser 2003; Gulati 2018).

The selection of genotypes of interest that represent the genetic variability of a species should be performed based on parameters that estimate the genetic diversity and structure of its populations (Almeida-Pereira et al. 2017; White et al. 2018; Motta et al. 2019; Niu et al. 2019). For that purpose, single nucleotide polymorphism (SNP) markers have been successfully applied in various plant species (Otto et al. 2017). The SNPs result from a specific modification due to interchange of a single base (Grover and Sharma 2016). As they are abundant and widely distributed in the genomes (Ganal et al. 2009; Kethom et al. 2019), they allow more accurate estimation of genetic diversity in natural populations (Chung et al. 2017; Nadeem et al. 2018; Kethom et al. 2019), even in non-model species, such as medicinal and aromatic plants (Wang et al. 2016; Otto et al. 2017; Marakli 2018; Do et al. 2019; Brito et al. 2021).

Medicinal and aromatic plants produce secondary metabolites, such as essential oils, that can be composed of an immense variety of chemical compounds of considerable agronomic and medicinal interest (Trindade and Lameira 2014; Santos et al. 2016). Euphorbiaceae is one of the largest and most diverse families of angiosperms and, in Brazil, it is made up of more than 1000 species and 64 genera. The Croton genus is prominent with about 1.200 species, distributed predominantly in the Americas (Silva et al. 2009), and with more than 300 species in Brazil (Caruzo et al. 2020), some with proven biological activities, such as anti-inflammatory (Kim et al. 2015), anticarcinogenic (Sampath et al. 2018), antioxidant (Embuscado 2015), antiprotozoal (Sharma et al. 2017), antifungal (Monteiro et al. 2017; Hailu et al. 2017), and antibacterial (Marchese et al. 2017) properties.

Croton grewioides Baill., popularly known as “alecrim-de-cabocla” or “canelinha” (SiBBr, 2020), is a plant native of Bolivia, Brazil and Peru (Riina et al. 2021). In Brazil, it occours in Caatinga, Cerrado, and Atlantic Forest biomes, with distribution in the states of Alagoas, Bahia, Ceará, Goiás, Minas Gerais, Paraíba, Pernambuco, Piauí, Rio Grande do Norte e Sergipe (Silva et al. 2009; Caruzo et al. 2020; SiBBr 2020). It is a diploid species (2n = 20) (Santos, 2016), monoecious, and has a racemiform inflorescence measuring between 1.3 and 5 cm in length with staminate and pistillate flowers, which are often visited by bees (Silva et al. 2009). It presents shrub growth habit (0.7–2 m), frequently found on rocky and sandy-clay outcrops in shrub and tree Caatinga ecoregions. It is widely used in popular medicine for relief of gastric and stomach problems and intestinal disorders, for cancer treatment, and for fever-reducing baths (Trindade and Lameira 2014). Thus, it has significant biological potential, with some antioxidant (Oliveira et al. 2021), acaricidal (Castro et al. 2019), antibacterial (Medeiros et al. 2017), and insecticidal (Camara 2008; Silva et al. 2008) activities already proven.

Considering the importance of the species and its agronomic and medicinal potential, due to the diversity of compounds it contains, the expectation for this study is to contribute information for designing strategies for conservation and use of this genetic resource. This study was conducted with the aim of studying the genetic diversity and structure of natural populations of C. grewioides in the Brazilian states of Bahia and Sergipe and selecting genotypes to compose the species collection in the AGB of Medicinal and Aromatic Plants of the Universidade Federal de Sergipe. To that end, SNP markers identified through the genotyping-by-sequencing (GBS) technology were used.

Materials and methods

Collection of plant material

Plant material was surveyed and collection was made of 40 genotypes of C. grewioides Baill. from the states of Bahia (23 genotypes—BA population), in the municipality of Conceição do Coité, and Sergipe (17 genotypes—SE population), in the municipalities of Poço Verde and Poço Redondo.

The municipality of Conceição do Coité (11°35′78.4″ S, 39°18′99.6″ W) is in the northeast region of the state of Bahia, at 420 m above sea level. It has a semi-arid climate, with sandy-clay soils and mean temperature of 22.3 °C. The rainy period is from November to April, with mean annual rainfall of 585 mm (Souza et al. 2003). Poço Verde (10°55′17.8″ S, 37°06′04.1″ W) is in the extreme southwest of the state of Sergipe, at 273 m above sea level. It has a semi-arid climate, shallow sandy-clay soils with a rocky outcropping base, and high salinity. The rainy period is from March to July, with mean annual temperature between 25–26 °C and mean annual rainfall of 646.1 mm (Oliveira et al. 2016). The municipality of Poço Redondo (09°58′06.5″ S, 37°51′48.4″ W) is in the northwest of the state of Sergipe, at 225 m above sea level. It has a semi-arid climate, with shallow, rocky, and dry soils and rocky outcrops. The rainy season is from March to July, with mean temperature of 25.2 °C, and mean annual rainfall of 552.0 mm (Diniz et al. 2014; Bitencurti et al. 2017).

New fresh leaves were collected of each genotype, wrapped in sterile gauze, and placed in ice to prevent oxidation. The samples were stored at -80 °C, then freeze dried in a Terroni LS3000 series bench-top freeze dryer, and stored in a freezer until DNA extraction. The project is registered in the National System of Management of Genetic Patrimony and Associated Traditional Knowledge (Sistema Nacional de Gestão do Patrimônio Genético e Conhecimentos Tradicionais Associados—SISGEN) under number A8CCB3B.

DNA isolation, preparation, and sequencing of the genomic library

Genomic DNA was extracted from C. grewioides leaves through the extraction protocol using the 2% CTAB (cetrimonium bromide) buffer (Ferreira and Grattapaglia, 1998) in the Genetic Diversity and Plant Breeding Laboratory, Genetics Department of ESALQ-USP, Piracicaba, São Paulo, Brazil.

The genomic DNA of each genotype was quantified through electrophoresis in 1% (w/v) agarose gel and stained with SYBR Safe™ DNA (Invitrogen), comparing the DNA with the phage lambda molecular size standards (Invitrogen). After that, the DNA samples were standardized to a concentration of 50 ng/µl for preparation of the GBS (genotyping-by-sequencing) library, following the double digestion protocol, as described by Poland et al. (2012).

The genomic DNA of each genotype was initially digested with two restriction enzymes (Pst1 and Mse1), selected after tests with different combinations of restriction enzymes. The fragments generated of each sample were linked to individual adaptors (bar code) (Sigma-Aldrich, St. Louis, MO, USA) and to specific adaptors for the Illumina technology (Illumina, Certification Program, CSPro). The ligation products were joined in a single pool (multiplex), which was purified using the QIAquick PCR® purification kit, and amplified by PCR, for enrichment of the fragments with both the restriction sites. After that step, the library was once more purified, validated, and quantified in an Invitrogen™ Qubit v 3.0 fluorometer (Thermo Fisher Scientific, Carlsbad, CA, USA). After that, the samples were sequenced on the platform HiSeq3000 (Illumina, Inc., San Diego, CA, USA) with single-end configuration (1 × 100 bp).

Processing of raw sequences and filtering for identification of SNP markers

The sequences obtained were processed on the Stacks program (Catchen et al. 2013) using a pipeline for species that do not have a reference genome. First, the sequences were demultiplexed, using the process_radtags module, and low-quality sequences were removed at the same time. The ustacks module was used to align the sequences and to form stacks of similar sequences for identification of loci with polymorphic nucleotides. The parameters used in this process were minimum depth (− m) = 3 and the maximum distance allowed between stacks to include them in the same locus (− M) = 2. The loci of each sample were grouped in a catalog using the cstacks module, allowing a maximum distance of two nucleotides (− n = 2) between the loci. The sstacks module was used to compare the groups of stacks of each genotype with the catalog. After that, the rxstacks module was used to perform corrections of the individual genotypes through removal of loci with low probability (− ln lim − 10). Finally, the populations module was used to filter the SNP markers using the following parameters: minimum allele frequency (MAF ≥ 0.05), minimum depth of the stack (m ≥ 3), and retention of a single marker per tag, occurring in at least 75% of the individuals in each population.

Genome diversity and structure

The estimate of genetic diversity of the C. grewioides populations was obtained through the observed (Ho) and expected (He) heterozygosities, the Shannon index (I), the inbreeding coefficient (f) of Wright (1965), and the number of exclusive alleles.

The analysis of molecular variance (AMOVA) was used to verify the distribution of genetic variance within and between the populations sampled, and its significance was tested with 10,000 permutations. The genetic structure between the populations was evaluated by the FST statistic of Weir and Cockerham (1984) and its significance was tested through 10,000 bootstraps. Analyses were performed through the GenAlEx program (Peakall and Smouse 2012).

The genomic structure, based on variation of the SNP markers, was evaluated by Discriminant Analysis of Principal Components (DAPC, Jombart et al. 2010) and by Bayesian analysis (Structure software). The DAPC was carried out with the adegenet 2.0.0 package (Jombart and Ahmed 2011) for R (R Core Team 2016). This analysis is free of genetic assumptions, such as the Hardy–Weinberg linkage equilibrium within the groups (Jombart et al. 2010). The DAPC maximizes the difference between the groups and minimizes the difference within them (Roullier et al. 2013). From the information on discriminant functions, a bar graph (Compoplot) was generated, with the probabilities of attribution of each individual to the clusters formed (Jombart and Ahmed 2011).

Bayesian analysis was performed using Structure v.2.3.4 software (Pritchard et al. 2000). Values of genetic clusters (k) ranging from 1 to 5 were tested and, for each k, 10 independent replications were performed. Each iteration consisted of a burn-in period of 10,000 iterations, followed by 50,000 MCMC (Markov Chain Monte Carlo) iterations, assuming the admixture ancestry model and alleles with uncorrelated frequency. The number of genetic groups (k) was identified by the ΔK method described by Evanno et al. (2005), implemented in the STRUCTURE harvester software (Earl and Vonholdt, 2011). Accessions that presented membership values below 0.8 were considered to have mixed ancestry.

The genetic divergence among the genotypes was evaluated by means of the genetic distance of Rogers (1972) and visualized through construction of a dendrogram using the UPGMA (Unweighted Pair Group Method with Arithmetic Means) algorithm. Analysis was performed with the assistance of the poppr package (Kamvar et al. 2014) for R (R Core Team 2016). Ten thousand bootstraps were performed to draw inferences regarding the reliability of the clusters. The final dendrogram was formatted through the program FigTree 1.4.1 (http://tree.bio.ed.ac.uk/software/figtree/). Principal component analysis (PCA) was performed using the dudipca function of the ade4 package (Dray and Dufour 2007) for R (R Core Team 2020).

The 30 C. grewioides genotypes selected to compose the species collection in the Active Germplasm Bank of Medicinal and Aromatic Plants were selected with the assistance of the maximum length sub tree function of the DARwin 6.0.010 software (Perrier and Jacquemoud-Collet 2006). This function removes the redundant genotypes and allows selection of those that retain maximum genetic variability. The calculation of genetic distance among the genotypes, necessary for application of the function, was calculated from the simple matching dissimilarity index: \(d_{ij} = 1 - \frac{1}{L}\mathop \sum \limits_{i = 1}^{L} \frac{{m_{i} }}{\pi }\), where \(d_{ij}\) is the dissimilarity between the i and j units; \(L\) is the number of loci; \(m_{i}\) is the number of corresponding alleles for the \(l\) locus; and \(\pi\) is the ploidy. A total of 10,000 bootstraps were performed.

Results

Sequencing of the GBS library resulted in a total of 272,866,599 reads. After quality control, demultiplexing, and filtering out the low-quality sequences, 6942 high-quality SNP markers were identified. These markers were used for characterization of the genetic diversity and structure of the 40 C. grewioides genotypes sampled from the states of Bahia (23 genotypes) and Sergipe (17 genotypes). The mean sequencing depth was 41.39X (± 17.71 sd), and within the SNPs detected, a greater proportion of transitions (55.75%) than transversions (44.25%) was observed. The most frequent mutations were A-G (28.35%) and C-T (27.40%).

The estimates of genetic diversity are shown in Table 1. The observed heterozygosity (\({H}_{o}\)) was 0.207 for the BA population and 0.215 for the SE population. These results were lower than that of expected heterozygosity for the two populations (0.230—BA and 0.219—SE). The inbreeding coefficient of the BA population was higher (0.085) than that of the SE population (0.025), and the Shannon Index was 0.346 for SE and 0.367 for BA. The presence of exclusive Ae alleles was detected in the two populations: 1296 in the BA population and 799 in the SE population.

Table 1 Genetic diversity estimates for two natural populations of Croton grewioides Baill. from the states of Bahia and Sergipe, Brazil (\(H_{o}\): observed heterozygosity; \(H_{e}\): expected heterozygosity; \(f = 1 - \frac{{H_{o} }}{{H_{e} }}\): inbreeding coefficient; \(I\): Shannon index; and \(Ae\): number of exclusive alleles)

Separation of genetic variation into two levels, within and between the populations, was performed through AMOVA (Table 2). The largest proportion of the total genetic variation was observed within the populations, 93% and 7% of this variation was detected between the populations analyzed. The estimate of the fixation index (FST) was 0.071, indicating that there is moderate genetic differentiation between the BA and SE populations.

Table 2 Analysis of Molecular Variance (AMOVA) and fixation index (FST) for two populations of Croton grewioides Baill. sampled in the states of Bahia and Sergipe, Brazil

Discriminant Analysis of Principal Components (DAPC) allowed the 40 genotypes analyzed to be divided into two different groups, without overlap between them (Fig. 1a). The same color in different genotypes indicates that they belong to the same group. The first group was composed by the genotypes of the BA population, and the second by the genotypes of the SE population. In addition, the division of the 40 genotypes into two groups could be visualized by the probability of association graph (Compoplot) (Fig. 1b), based on DAPC analysis.

Fig. 1
figure 1

Discriminant Analysis of Principal Components (DAPC) of the 40 Croton grewioides Baill. genotypes sampled in the states of Bahia and Sergipe, Brazil. (a) Density plot considering the first discriminant function. Each bar represents an individual; (b) Bar graph illustrating the probabilities of assignment of each genotype. Each color represents a state (green: Bahia—BA; lilac: Sergipe—SE)

The Bayesian analysis (Fig. 2), performed using the Structure software, also allowed the division of the 40 genotypes studied into two groups. Group I (green) was composed of individuals from the BA population, and group II (purple) was composed of individuals from the SE population, with the exception of individual Cg 33 who presented mixed ancestry (membership values less than 0.8).

Fig. 2
figure 2

Estimated population structure of 40 genotypes of Croton grewioides Baill. on K = 2 according to the population of origin. Each color represents a state (green: Bahia—BA; lilac: Sergipe—SE)

The clustering of the 40 genotypes according to their population of origin was also observed with construction of the dendrogram by the UPGMA method, based on Rogers’ genetic distance (Fig. 3). The CGR18 and CGR21 genotypes exhibited greater genetic distance from the other genotypes sampled in the BA population. For the SE population, the CGR32 and CGR37 genotypes exhibited greater genetic distance from the other genotypes sampled. The Principal Component Analysis explained 16.61% of the total variation (Fig. 4) and allowed separating the genotypes according to the population of origin.

Fig. 3
figure 3

Dendrogram with 40 Croton grewioides Baill. genotypes obtained by the UPGMA clustering method based on Rogers’ genetic distance (Rogers, 1972), with 10,000 bootstraps. Each color represents a state (green: Bahia—BA; lilac: Sergipe—SE)

Fig. 4
figure 4

Principal components analysis (PCA) showing the 40 genotypes of Croton grewioides Baill. clustered into two groups. Each color represents a state (green: Bahia—BA; lilac: Sergipe—SE)

Use of the maximum length sub tree function of the DARwin 6.0.14 software allowed selection of 30 genotypes to compose the C. grewioides collection (Table 3). This function was used to remove the redundant genotypes, with the aim of retaining the largest proportion of the genetic variability detected in the populations analyzed in a smaller number of genotypes. Seventeen (17) individuals from the BA population and 13 individuals from the SE population were selected, which retained 99.38% of the total genetic variability detected in the two populations.

Table 3 Genotypes of Croton grewioides Baill. selected to compose the species collection in the Active Germplasm Bank

Discussion

The study of the genetic diversity and structure of natural populations is of utmost importance for designing strategies for conservation and use of species that have economic potential, such as C. grewioides Baill. SNP molecular markers have been an ally for carrying out these studies. SNPs are abundant in the genomes of plant species and, together with reduction in costs and increase in the ease of access to genotyping technologies, their use in genetic studies of natural populations has become frequent (Grattapaglia et al. 2011). In addition, advances in molecular techniques and in bioinformatics have allowed large-scale genotyping of species that do not have a reference genome. Thus, this is the first study performed with SNP markers for C. grewioides, based on GBS genomic libraries. Moreover, this is the first study carried out to analyze the genetic diversity and structure through molecular markers for the species.

Transition bias was detected through analysis of the sequences of the C. grewioides genome; that is, the percentage of transition (55.75%) was greater than that of transversion (54.25%). Transition refers to DNA mutation in which there is interchange between the two purines (A and G) or between the two pyrimidines (C and T), whereas transversion is the interchange of a purine for a pyrimidine, or vice-versa (Guo et al. 2017). The transition bias frequently occurs in plants (Stern and Orgogozo 2008), as was observed in the present study and for the species Jatropha curcas L. (Gupta et al. 2012) and Croton tetradenius Baill. (Brito et al. 2021). Two sources of origin have been reported to explain the transition bias. The first refers to the fact that for transition to occur, distortion of the double strand of DNA is less than in transversion; consequently, transition should occur with greater frequency in DNA replication. In addition, deamination, chemical alteration of the nucleotides, leads to transition (Zou and Zhang 2021).

The C. grewioides populations analyzed showed lower genetic diversity than that expected by Hardy–Weinberg equilibrium (population under random crosses), since the observed heterozygosity was lower than the expected heterozygosity. This result is an indication of deficit of heterozygotes in the populations (Frankham 2003; Hartl and Clark 2010), as seen by the estimated value of the inbreeding coefficient (f: 0.056). The deficit in the frequency of heterozygotes may have been brought about by non-random crosses, as a result of crosses between related individuals and/or self-fertilization, or even as a result of the reproductive biology of the species. Monoecious species, such as C. grewioides (Silva et al. 2009), can self-fertilize, resulting in reduction in the heterozygotes in the population (Awad and Billiard 2017; Islam et al. 2020).

The values observed for the Shannon Index (I: 0.346 Bahia; 0.367: Sergipe) in this study were similar to those obtained for C. tetradenius (0.39–0.50), using ISSR markers (Almeida-Pereira et al. 2017); for Croton florisbundus (0.31–0.43), using SSR markers (Sivestrini et al. 2015); and for Croton urucurana Baill. (0.322–0.342) using ISSR markers (Costa et al. 2020). This diversity index shows variation from 0 to 1, and the nearer the value to zero, the lower the genetic diversity of the population under analysis (Botrel et al. 2006; Gois et al. 2014; Silva et al. 2015).

The pattern of distribution of genetic variability within and among populations is the result of the interaction among different factors in the evolution of this distribution. Selection, effective population size, and pollen and seed dispersal are of central importance for the distribution observed (Hamrick 1989). Characteristics intrinsic to the species, such as the reproductive system and floral morphology allow understanding of the genetic structure in natural populations. Allogamous species tend to have high genetic diversity within populations and little interpopulation variation (Hamrick 1989); 10–20% of the genetic variation for these species is found among populations. However, autogamous species can show up to 50% genetic variation among populations (Hamrick and Godt 1989).

From the results of AMOVA for the C. grewioides populations studied, it can be inferred that the species can cross fertilize, given the lower genetic variation between the populations (7%). Similar results were observed for the species C. urucarana (21.72%; Costa et al. 2020) and C. tetradenius (5%; Brito et al. 2021), in which lower genetic variation was observed among the populations than within the populations. The value observed for the FST statistic (0.071) indicates that the populations studied have moderate genetic differentiation (Silva-Júnior et al. 2020apud Wright 1978; Leviyang and Hamilton 2011).

The pollen and seed dispersal pattern (gene flow) of the species is another factor that determines the population genetic structure (Zanella et al. 2012; Gulati 2018; Costa et al. 2020). The presence of exclusive alleles is an indication of restricted gene flow (Seoane et al. 2000) and was observed in the two populations analyzed. Of the total SNP loci evaluated, 18.67% were observed only in the BA population and 11.51% only in the SE population.

The pistillate flowers of the species of the Croton genus have nectar-producing glands around the ovary that attract insects (Thaowetsuwan et al. 2020). Thus, in general, species of the Croton genus are associated with dispersal agents and pollinizers that move in restricted distances, such as ants, flies, wasps, bees, and butterflies (Passos and Ferreira 1996; Lôbo et al. 2011; Nascimento et al. 2014; Silva et al. 2020). In addition, the Croton seeds have a structure called a caruncle, which is rich in lipids (Leal et al. 2007). Ants consume this structure and leave the seeds exposed, generally near the mother plant, collaborating with the spatial genetic structure observed (Vander Wall and Longland 2004), which, for its part, can contribute to crosses between related individuals.

Furthermore, the C. grewioides populations studied are in areas that have marked climatic seasonality and low rainfall, characteristics of arid and semi-arid regions. These conditions favor abiotic vectors (such as autochory and anemochory), very representative in dry periods (Silva and Rodal 2009; Silva et al. 2013; Souza et al. 2013; Quitino and Machado 2014). Pollination through wind, insects, or both have been reported (Dominguez and Bullock 1989; Pires et al. 2004; Quirino and Machado 2014; Costa et al. 2020) for the Croton genus. Autochory was also observed for the species Croton blanchetianus Baill. (Silva and Rodal 2009; Silva et al. 2013), C. florisbundus (Silvestrini et al. 2015), C. heliotropiifolius Kunt (Silva et al. 2013), and Croton rhamnifolioides Pax & K.Hoffin. (Silva and Rodal 2009). Consequently, species of the Croton genus that have this type of dispersal tend to have limited seed spatial dispersal (Passos and Ferreira 1996; Narbona et al. 2005; Silva et al. 2013), and this factor may have contributed to the genetic variability distribution pattern observed among the populations.

The multivariate analyses performed (DAPC, Bayesiana, dendrogram based on Rogers’ genetic distance and PCA) followed the same pattern of division of the 40 genotypes, which were separated according to population of origin. The bar graph, based on DAPC analysis, indicates the probability of association of the genotypes to the different groups (Jombart et al. 2010), suggesting that there is wide agreement between the genetic structure and the locations of origin of the genotypes. Thus, the results of differentiation observed by the multivariate analyses for the populations studied may have been affected by the geographic distance (325 km). In addition, forest fragmentation and unsustainable exploitation of plant resources contribute to reduction in population and, consequently, this can reduce the effectiveness of pollination and seed dispersal by insects and wind (Brito et al. 2021).

The presence of exclusive alleles in both populations corroborates these observations, since populations isolated for a long time can accumulate exclusive alleles, and this is reflected in the genetic divergence observed (Prentice et al. 2003). The moderate genetic differentiation, and the observation of genotypes with mixed ancestry in Bayesian analysis, may portray the historic gene flow between the populations under study, and this therefore suggests that these populations may become more genetically distant over the generations (Bohonak 1999; Pence 2017; Silva-Júnior et al. 2020).

Due to the economic potential of the species, the results obtained in the present study were essential for delineating collection of genotypes to compose the collection of the species in the active germplasm bank, with allelic diversity that represents the populations of origin (Blank et al. 2019). Considering that the genetic variability present in the populations can reflect the chemical variation of the genotypes (Costa et al. 2020), the use of the maximum length sub tree function of the DARwin 6.0.14 software allowed the selection of 30 C. grewioides genotypes that contained the largest proportion of genetic variability detected in the populations analyzed (99.38%).

The conservation of samples that represent the genetic variability of the species is essential for identification and selection of genotypes of interest, which may be used directly or in breeding programs aiming at development of cultivars (Blank 2013; Blank et al. 2015; Souza et al. 2015; Brito et al. 2016). Combining molecular, morphological, and chemical data is of utmost importance for efficient use of genetic resources selected to compose the species collection in the AGB.

Conclusion

Considering the economic potential of C. grewioides, the results of genetic diversity and structure obtained in this study provide essential information for designing strategies for conservation and use of the species. The greater proportion of genetic variation detected within the populations and moderate genetic structuration between the populations show the need for the conservation of genotypes surveyed in the two states. In addition, given the amplitude of natural occurrence of the species, populations should be surveyed in other locations to expand the genetic base to be conserved in the collection.