1 Introduction

The plastome of photosynthetic angiosperms ranges from ∼107 kb (Solórzano et al. 2019) to ∼242 kb (Weng et al. 2016), and the vast majority has a quadripartite structure consisting of one LSC (large single copy) and one SSC (small single copy) region separated by two IRs (inverted repeats). It encodes approximately 80 protein-coding genes, 30 tRNAs, and four RNAs found in most plant groups (Mower and Vickrey 2018). However, some of them, mostly heterotrophic plants, have rearranged plastomes with large-scale gene losses (see review by Wicke and Naumann 2018).

Some autotrophic plants underwent different levels of plastome restructuring from large-scale gene losses to structural rearrangements. Loss of the ndh complex, for example, has been reported for Orchidaceae (Kim et al. 2015) and the subfamily Cactoideae (Sanderson et al. 2015; Solórzano et al. 2019). The ndh complex is the most affected gene group and is related to the first step of genetic losses in heterotrophic plants (Barret et al. 2014). Extensive changes were reported for the IRs of Erodium (Geraniaceae), as increase, decrease, and total loss of this set in a single genus (Blazier et al. 2016). Losses of IRs were reported for Tahina spectabilis (Arecaceae, Barrett et al. 2015), Passiflora (Passifloraceae, Cauz-Santos et al. 2020), the IR-lacking clade (IRLC) of Leguminosae (Choi et al. 2019), the Putranjivoid clade (Lophopyxidaceae and Putranjivaceae, Jin et al. 2020), as well as for Carnegiea gigantea (Cactoideae, Cactaceae, Sanderson et al. 2015). The order Caryophyllales, to which Cactaceae belongs, is characterized by a large number of structural changes in the plastome (Yao et al. 2019), in plastome size, movements, and IR losses, in addition to gene losses along the evolution of that lineage.

Cactaceae is a family of succulent perennial plants comprising over 1450 species classified into 127 genera (Barthlott and Hunt 1993; Hunt et al. 2006). The species of this family occur predominantly in warm and dry regions of the Americas, except for Rhipsalis baccifera, which extents also to Africa and Southeast Asia (Hernández-Hernández et al. 2014; Judd et al. 2015). This family originated in South America (Hershkovitz and Zimmer 1997; Edwards et al. 2005; Nyffeler 2007; Arakaki et al. 2011) and diversified approximately 48.3 Mya (Silva et al. 2017), throughout the mid- to late Miocene and into the Pliocene (Arakaki et al. 2011). Most cactus species occur in semi-arid locations; however, some species of cacti are adapted to epiphyte microhabitats under high precipitation regimes in tropical forests (Barthlott and Hunt 1993).

Morphologically, cacti are characterized by succulent stem and presence of areoles. Their habit ranges from large trees to small geophytes, ranging from shrubby to scandent or epiphytic forms (Barthlott and Hunt 1993). Cacti leaves are transformed into thorns grouped in areoles, although in Pereskia normal laminar leaves are found besides thorns. Cactus species share an outstanding physiological attribute, the crassulacean acid metabolism (photosynthetic type CAM), which enables greater efficiency on the use of stored water and morphological characteristics such as the succulence of the vegetative part, using it for water store and allowing survival through drought periods (Lüttge 2004; Mauseth 2006).

Cactaceae is a monophyletic family comprising four subfamilies: Cactoideae, Maihuenioideae, Pereskioideae, and Opuntioideae (Anderson 2001; Nyffeler 2002; Hunt et al. 2006; Mauseth 2006). Cactoideae contains the largest number of genera and species and the largest diversity of growth forms and habits (Barthlott and Hunt 1993; Anderson 2001). Cactoideae diversity of adaptation to different environments ranges from columnar and arborescent cacti in dry tropical forests to epiphytes with cylindrical or flattened stems in humid tropical forests of Central and South America. Moreover, as a variety of globose, both independent and caespitose forms spread over arid and semi-arid regions (Hernández-Hernández et al. 2011). Cactoideae is subdivided into eight tribes: Browningieae, Cacteae, Cereeae, Hylocereeae, Notocacteae, Pachycereeae, Rhipsalideae, and Trichocereeae. The BCT clade (Browningieae, Cereeae, and Trichocereeae) consists mainly of arborescent cacti from South America, although including some globose representatives, such as Melocactus (Nyffeler 2002; Hernández-Hernández et al. 2011).

Cactaceae comprises species of ornamental importance for their beauty and rusticity, gaining space in the ornamentation of public areas, mainly in arid regions, where water is scarce (Pérez-Molphe-Balch et al. 2015). Leaves, cladodes, and fruits of some species are consumed in gastronomy (Coradin et al. 2011; Santiago and Coradin 2018). In the northeastern region of Brazil, cacti are also used as animal feed, mainly in times of food shortage (Santana Neto et al. 2015). Some exotic species are cultivated (Santos et al. 2006), such as Opuntia ficus-indica and Nopalea cochenillifera, which possibly originated in Mexico (Hunt et al. 2006), while native Brazilian species such as Cereus jamacaru, Pilosocereus gounellei, and Melocactus zehntneri suffer from extractive harvesting (Souza and Pacheco 2019). Out of the 37 Melocactus species listed by the IUCN, which is underrepresented number of the genus diversity, 14 of them belong to some threat category (IUCN 2021).

Melocactus (ca. 50 species) and Discocactus (14 species) (Hunt et al. 2006; Taylor 1991; Taylor and Zappi 2018) are sister genera, positioned in the subfamily Cactoideae within the BCT clade (Hernández-Hernández et al. 2011; Silva et al. 2017). Both share a characteristic globose body and the presence of a distinctive reproductive structure called cephalium, although differing in their anthesis, being diurnal in Melocactus and nocturnal in Discocactus (Taylor and Zappi 2004; Hunt et al. 2006). Melocactus has neotropical distribution from the Antilles to eastern Brazil, while Discocactus is restricted to South America (Hunt et al. 2006; Taylor 1991; Taylor and Zappi 2018). Discocactus and Melocactus diverged approximately ~ 3.81 Mya (Silva et al. 2017). This recent origin is probably associated with low nucleotide diversity and unsuccessful attempts to resolve the molecular phylogenetic relationships of Melocactus (Taylor et al. 2014). Currently, there are no molecular phylogenetic studies for Melocactus, although it has been used as an outgroup (Ritz et al. 2007; Calvente et al. 2016) and in studies that searched intergeneric relationships in Cactaceae (Hernández-Hernández et al. 2011, Fatinati et al. 2021). Thus, a few species have some sequences available, such as the atpB-rbcL, petL-psbE, psbD-trnT, rpl16, trnK-rps16, trnL-trnF, trnL-trnT, trnS-trnG, ycf1, and PhyC (See Ritz et al. 2007; Calvente et al. 2016; Fatinati et al. 2021).

In our study, we described the plastomes of Discocactus bahiensis and Melocactus ernestii (Cactoideae, Cactaceae), two species native to Brazilian semi-arid region. The first de novo assembly of chloroplast genomes for these genera is presented. We conducted a detailed characterization of their structural organization based on gene number and its relative location in the chloroplast genome and compared it to ten other representatives of Cactoideae, as well as to canonical plastomes of related families Portulacaceae (Portulaca oleracea) and Amaranthaceae (Spinacia oleracea). We aimed to answer three questions: (1) Are the plastomes of D. bahiensis and M. ernestii structurally rearranged, and to what degree when compared to canonical plastomes and other Cactaceae? What are the possible structural causes for rearrangements in Cactoideae? Which are the most informative regions for phylogenetic reconstruction in the BCT clade? We discussed the polymorphisms in Cactaceae plastomes and proposed polymorphic regions suitable for infra- and intergeneric phylogenetic analyses in Discocactus, Melocactus, and related groups.

2 Material and methods

Plant material and DNA extraction –

Plant material was collected in situ (ESM 1), and vouchers were deposited at the Herbarium Jayme Coelho de Moraes (EAN). Total DNA was extracted from the roots of D. bahiensis and from epicarp and roots of M. ernestii (ESM 1), dried on silica gel, according to Doyle and Doyle (1987) and modifications by Ferreira and Grattapaglia (1998). DNA quality was determined by 1% agarose gel electrophoresis and DNA quantity by NanoDrop 2000 (Thermo Scientific, Delaware, USA).

Genome sequencing, assembly, gene annotation, and codon usage analysis

Samples of D. bahiensis and M. ernestii were used for next-generation sequencing (NGS). PCR free libraries with fragments of 500 bp were prepared and sequenced by BGI (China). One gigabase paired end reads 250-bp long were generated using the HiSeq 4000 BGISeq500 platform (Illumina Inc.).

Assembly of the plastid genomes of D. bahiensis and M. ernestii was conducted with NOVOPlasty 3.3 (Dierckxsens et al. 2016) using k-mers of 39 bp. We selected the trnS-G region (KT717378.1) of Melocactus conoideus, which is located at the centre of the LSC of the analysed species as seed. The complete plastomes of D. bahiensis and M. ernestii were submitted to the GenBank with accession numbers BankIt2497373 (OK037110) and BankIt2497373 (OK037111), respectively.

Discocactus bahiensis and M. ernestii assembled contigs were annotated in the Ge-Seq (organellar genome annotation) pipeline (Tillich et al. 2017) against the references of C. gigantea (NC_027618.1), Mammillaria albiflora (Cactoideae, Cactaceae) (MN517610.1), Mammillaria pectinifera (MN519716.1), Mammillaria crucigera (MN517613.1), Mammillaria huitzilopochtli (MN517612.1), Mammillaria solisioides (MN518341.1), Mammillaria supertexta (MN508963), Mammillaria zephyranthoides (MN517611.1), Opuntia quimilo (Opuntioideae, Cactaceae) (MN114084), Portulaca oleracea (NC_036236), and Spinacia oleracea (NC_002202). Manual adjustments were made to start and stop codons using Geneious 10.2.6 (Kearse et al. 2012). Pseudogenes were classified based on deletions in their sequences and the presence or absence of internal stop codons. Annotations of tRNA genes were confirmed by the tRNAscan-SE software (Lowe and Chan 2016). Geneious 10.2.6 (Kearse et al. 2012) was used to analyse codon usage and GC content. Finally, we used OGDRAW (Greiner et al. 2019) free available online in the CHLOROBOX platform for circular illustration of the annotated plastomes.

Comparative analysis –

We compared the two plastomes assembled to other plastomes from Caryophyllales. Structural changes were detected with MAUVE (multiple alignment of conserved genomic sequence with rearrangements) tool (Darling et al. 2004), gene rearrangements with GRIMM (Genome Rearrangements in Man and Mouse) (Tesler 2002), and the number, type, and extension of repeated regions with Find Repeats tool implemented in Geneious 10.2.6 (Kearse et al. 2012). The search for polymorphic regions was performed by DNA Sequence Polymorphism Version 6.12.03 × 64 (Rozas et al. 2017).

To visualize the structural rearrangements that occurred in Discocactus and Melocactus plastomes, we used the MAUVE tool in Geneious 10.2.6. It displays the order and orientation of segments and facilitates comparative analyses of genomes. The comparisons were performed to phylogenetically distant (S. oleracea) and phylogenetically close (P. oleracea) members of Caryophyllales, both considered canonical plastomes, and to Cactaceae members, all of them presenting rearranged plastomes: C. gigantea, Rhipsalis baccifera (MT821847.1), Rhipsalis teres (MT387452.1), besides the three structural variants documented in Mammillaria Haw. (Solórzano et al. 2019), group 1: M. albiflora and M. pectinifera; group 2: M. crucigera, M. huitzilopochtli, M. solisioides, M. supertexta, and group 3: M. zephyranthoides. For better visualization of the syntenic blocks, we excluded the IRa from all plastomes analysed.

To visualize the contribution of genes and intergenic regions to the total size of the plastomes of D. bahiensis and M. ernestii, we confronted them to plastomes of other members of the subfamily Cactoideae (C. gigantea, M. albiflora, M. supertexta, M. zephyranthoides, and R. teres) and canonical plastid genomes (P. oleracea and S. oleracea). Total plastid genomes, gene, and intergenic regions were analysed separately by Geneious 10.2.6. after manual data extraction.

For analysis of rearrangements of gene orders in syntenic blocks, plastomes were analysed using GRIMM to infer the most parsimonious number of rearrangements. After removal of the IRa, plastomes of M. ernestii (present study) and P. oleracea were aligned using MAUVE (Darling et al. 2004). Syntenic blocks of P. oleracea and M. ernestii were extracted and analysed.

In order to find small, repetitive regions in the plastomes, we used the Find Repeats tool implemented in Geneious 10.2.6, following the criteria established by Jin et al. (2020): minimum size of 30 bp and zero mismatches.

Comparative analyses of polymorphic regions were performed using the plastomes of D. bahiensis and M. ernestii for developing primers for Sanger sequencing. The plastomes were aligned and subsequently compared by DNA Sequence Polymorphism Version 6.12.03 × 64 (Rozas et al. 2017), with 600-bp window length and 200-bp step size. The first thirty windows were analysed separately in search of polymorphic regions for Sanger sequencing. Eligible regions should fulfil the following criteria: 1st) to be comprised between 700 bp and 1100 bp; and 2nd) to have primers located in gene regions without polymorphisms between the species analysed. For comparison, we analysed regions widely used in phylogenies for the Cactaceae family (rpl16, trnL-F, and trnK/matK) in Discocactus and Melocactus using the DNA Sequence Polymorphism Version 6.12.03 × 64, formerly used by Hernández-Hernández et al. (2011).

3 Results

First plastomes of Discocactus and Melocactus

Single contigs were obtained in the assembled plastomes of D. bahiensis and M. ernestii, which included 399,972 (777 ×) and 56,872 (109 ×) assembled reads, respectively. These chloroplast genomes showed a quadripartite structure, with 128,733 bp in D. bahiensis and 130,703 bp in M. ernestii (Fig. 1). LSC, SSC, and IRs regions represented 39.87%, 14.68%, and 45.44% each of D. bahiensis plastome, respectively, with variation in GC levels mainly between SSC (39.1%), and IRs (35.8%; Table 1). In the M. ernestii plastome, LSC, SSC, and IRs regions represented 39.44%, 14.61%, and 45.96% each of the total plastome, respectively, with similar variation in GC levels (Table 1). D. bahiensis and M. ernestii plastomes presented 136 genes, of which 92 were protein-coding (62.98 and 62.61% of plastome size, respectively), 40 tRNA (5.64 and 5.63%), and four rRNA (3.51 and 3.45%) (Table 1). There were 20 pseudogenes for both species (accD, cemA, clpP fragment, ndhB, ndhC, ndhD, ndhF, ndhH, ndhJ, psbH, rpl2, rpl16, rpl20, rpl33, rps12, rps16, rps18, ycf1 fragment, ycf3, and ycf3 fragment) and two pseudogenes specific to M. ernestii (ycf2 and ycf68) (Table 2). Intergenic regions represented 27.88 and 28.31%, respectively (Table 1). Comparison between the total plastome size of D. bahiensis and M. ernestii against P. oleracea revealed a loss of 27,800 and 25,830 bp, in which 18,132 and 17,274 bp were from gene regions and 9,668 and 8,556 bp from intergenic regions, respectively (ESM 2, 3, and 4).

Fig. 1
figure 1

Plastomes of Discocactus baihensis and Melocactus ernestii. A Chloroplast maps. Colours represent different functional gene groups. Dashes in the central circle delimit the LSC, SSC, and IRs regions. B Discocactus baihensis in situ. C Melocactus ernestii (left) in situ

Table 1 Chloroplast genome composition of Discocactus bahiensis and Melocactus ernestii
Table 2 Genes of the Discocactus bahiensis and Melocactus ernestii chloroplast genome

After comparing plastomes presented in this study with the canonical plastome of P. oleracea, we identified the loss of nine genes: six CDSs (ndhA, ndhE, ndhG, ndhI, ndhK, and rpl23) and three tRNAs (trnA-UGC, trnV-GAC, and trnV-UAC) (ESM 5). The ndh complex was the most affected, lacking five of 11 genes found in canonical plastomes. Moreover, considerable nucleotide losses were observed when compared with P. oleracea and S. oleracea, reaching up to 94% in the ndhF gene. Genes that are usually duplicated and located in the IRs were translocated to the LSC (rps12 and rps19) or the SSC region (ndhB, rps7, rpl2, rrn4.5, rrn5, rrn16, rrn23, trnI-GAU, trnN-GUU, and trnR-ACG), losing one of its copies. In contrast, translocations and duplications of the following 28 genes from the LSC of canonical plastomes, P. oleracea and S. oleracea, to the IR regions were observed: accD, atpB, atpE, matK, ndhC, ndhJ, petN, psaI, psbA, psbM, rbcL, rpl2, rps4, rps16, rps19, trnC-GCA, trnE-UUC, trnD-GUC, trnF-GAA, trnH-GUG, trnK-UUU, trnL-UAA, trnM-CAU, trnS-GGA, trnT-UGU, trnY-GUA, ycf3 fragment, and ycf4 (Fig. 1, Table 2).

From ten genes (six protein-coding genes and four tRNA) containing common introns for both species, four were located in the IR regions, while in M. ernestii, the ycf1 gene presented an unusual intron (ESM 6 and 7). Introns of genes rpco1, clpP, petB, petD, and rpl16 were lost in both species. Other modifications in gene structure were found. While the accD gene of D. bahiensis and M. ernestii  were 1,274 and 1,404 bp long with an unusual intron of 217 and 753 bp, respectively, P. oleracea showed 1,560 bp (ESM 8). The ndhB gene showed large structural losses, presenting 922 and 930 bp in D. bahiensis and M. ernestii, respectively, against 2,201 bp in P. oleracea (ESM 9). Finally, the ycf1 gene showed an unusual intron inM. ernestii, with 660 bp and 54.2% identity compared to P. oleracea. Other regions showed identities greater than 95% in exons and 85% in introns.

Comparative analysis of complete plastomes –

Compared to a canonical plastome, such as in P. oleracea, at least 11 inversions would be necessary to obtain the current plastome structure of D. bahiensis and M. ernestii (ESM 10). The number of conserved syntenic blocks ranged depending on the species compared. Between P. oleracea, S. oleracea, D. bahiensis, and M. ernestii, we observed the conservation of 17 syntenic blocks (Fig. 2, ESM 5). Carnegiea gigantea showed six syntenic blocks when compared to P. oleracea, indicating a smaller number of rearrangements, while R. baccifera and R. teres showed 11 and ten syntenic blocks, respectively, when compared to P. oleracea (Table 3). For the three plastome structures of Mammillaria, we observed 14 syntenic blocks in M. pectinifera and 16 in M. albiflora, corresponding to structural variant 1 (Solórzano et al. 2019). For structural variant 2, we observed 14 to 17 conserved syntenic blocks (Table 3) and for structural variant 3, 20 syntenic blocks (Table 3). Between all Cactoideae analysed, we found 27 syntenic blocks with stronger similarity in plastome structure between the latter two species (ESM 11).

Fig. 2
figure 2

Structural relationships between Discocactus bahiensis, Melocactus ernestii and canonical plastomes of Caryophyllales (Spinacia oleracea and Portulaca oleracea). Colours represent conserved syntenic blocks between analysed species as indicated by MAUVE

Table 3 Number of repeats, IR size, and number of in Discocactus and Melocactus compared to Portulaca oleracea, Spinacia oleracea, and other plastomes of Cactaceae

Search for repeats –

To analyse a possible correlation between plastome rearrangements and the presence of repetitive sequences, we searched for repeats between 30 and 200 bp in the plastomes analysed. While P. oleracea and S. oleracea presented only four repeats, M. zephyranthoides and M. ernestii presented eight and nine repeats, respectively. The remaining Cactaceae species presented more than 30 repeats, with more than 66 repeats in M. solisoides (Table 3).

Identification of polymorphic regions for interspecific phylogenetic analyses –

Plastome alignment, associated with scrutinizing for polymorphic regions, allowed us to characterize five candidate polymorphic regions suitable for PCR amplification and Sanger sequencing (Table 4). The intergenic trnF-D region had the highest polymorphism index (pi = 0.068) and the intergenic ycf1 fragment-ndhB region had the lowest polymorphism index (pi = 0.033). Contrasting these regions with those analysed by Hernández-Hernández et al. (2011) (rpl16, trnL-F, and trnK/matK), we observed pi below 0.007 (Table 4).

Table 4 Intergeneric polymorphism in plastid regions between Discocactus bahiensis and Melocactus ernestii ordered by polymorphism index (pi). For comparison, the three most frequently used plastid regions in Cactaceae phylogenetic reconstructions are included

The trnF-D and ycf1 fragment-ndhB regions are shared between D. bahiensis, M. ernestii, R. baccifera, and R. teres. However, the trnF-D region has more than 1500 bp in Rhipsalis species and ca. 900 bp in D. bahiensis and M. ernestii. The ycf1 fragment-ndhB region has more than 700 bp for these species. The primers developed for the intergenic spacer rps11-rps8 and for two internal regions of the ycf1 gene (ycf1_1 and ycf1_2) were not fully conserved in the Cactaceae plastomes analysed, because these are highly polymorphic regions, showing substitutions and/or deletions. We recommend these primers for Discocactus, Melocactus and related genera (ESM 12), but new primers may amplify these regions in other species.

4 Discussion

Plastomes of D. bahiensis and M. ernestii show the Plantae typical quadripartite structure with 128,733 and 130,703 bp, within the known range in Caryophyllales (from 107,343 to 170,974 kb) (Solórzano et al. 2019; Yao et al. 2019). However, when compared with canonical plastomes of the order Caryophyllales (P. oleracea and S. oleracea), those of D. bahiensis and M. ernestii showed reduction of approximately 20,000 bp at full size. This reduction is mainly related to gene losses. Most photosynthetic angiosperms have their plastome structure conserved (Mower and Vickrey 2018), although being possible to observe occasional modifications in its gene organization caused by rearrangements (Martínez-Alberola et al. 2013; Solórzano et al. 2019; Cauz-Santos et al. 2020).

Our analyses showed very strong restructuring in the plastid genome of D. bahiensis and M. ernestii, with 17 syntenic blocks when compared with the canonical plastome of P. oleracea. This restructuring can be caused by a series of inversions. In C. gigantea, and species of the genus Mammillaria and Rhipsalis, we observed between six to 20 syntenic blocks using the same parameters. It was hypothesized that IR regions could play a role in the stabilization of plastomes (Palmer and Thompson 1982). However, subsequent studies on Erodium (Geraniaceae), a genus that presents species with and without IR, indicated that the absence of IR may not be a determining factor for rearrangements, but rather the presence of small repeats (Blazier et al. 2016; Ruhlman and Jansen 2018). A recent study on the Putranjivoid clade reaffirmed the hypothesis that repeats influenced the increase in structural variation (Jin et al. 2020). However, our sampling consisted of 12 plastomes of the subfamily Cactoideae and revealed from six to 20 syntenic blocks between them and the canonical plastome of P. oleracea showing no correlation between the absence of IR regions or the number of repeats and the degree of rearrangements in the plastome. Nevertheless, the highest number of repeats was observed in species with reduced or absent IRs. Further studies comprising a higher number of species would determine whether this lack of correlation characterizes the whole subfamily or whether other clades present a different pattern.

Plastid genes of Discocactus bahiensis and Melocactus ernestii: loss and pseudogenization of genes and their implications –

Plastomes of D. bahiensis and M. ernestii suffered significant gene losses associated with decreasing genome size and strong restructuring. In photosynthetic angiosperms, plastidial genome restructuring is an infrequent phenomenon (Mower and Vickrey 2018). However, gene loss is not uncommon, and many alterations of different nature and proportions have been documented in plastomes of some groups. In Caryophyllales, significant gene losses have been reported for Droseraceae and Molluginaceae (Nevill et al. 2019; Yao et al. 2019). Carnegiea gigantea and species of the genera Mammillaria and Rhipsalis (Cactoideae, Cactaceae) also presented significant gene losses (Sanderson et al. 2015; Solórzano et al. 2019; Oulo et al. 2020; Silva et al. 2021). Unlike Cactoideae, the subfamily Opuntioideae did not have such drastic gene losses, although presenting alterations of the IRs after duplication of genes previously positioned in the SSC region (Köhler et al. 2020). The number of genes in D. bahiensis and M. ernestii plastomes, slightly higher than in P. oleracea, is explained by the fact that 30 genes are present in the IRs, being duplicated, against only 19 genes in P. oleracea. Nevertheless, D. bahiensis and M. ernestii lost nine genes. Thus, gene losses are common in the Cactaceae family, especially in the subfamily Cactoideae, affecting mainly the ndh complex (Sanderson et al. 2015; Solórzano et al. 2019; Silva et al. 2021), as in other groups (Kim et al. 2015; Lin et al. 2015).

The NAD(P)H-dehydrogenase complex is known to be the most affected by losses and pseudogenizations (Sanderson et al. 2015; Solórzano et al. 2019; Silva et al. 2021). Accordingly, in this study we observed that five subunits were lost, ndhA, ndhE, ndhG, ndhI, and ndhK, and ndhB, ndhC, ndhD, ndhF, ndhH, and ndhJ were pseudogenized in D. bahiensis and M. ernestii. In Cactoideae, three plastome structures are known for Mammillaria. Two of them have lost their ndh subunits or have been pseudogenized and only one has preserved a functional subunit (Solórzano et al. 2019). In R. baccifera, five NAD (P) H-dehydrogenase complex subunits were found, with ndhD designated as functional gene (Oulo et al. 2020). In R. teres, six NAD (P) H-dehydrogenase complex subunits were found, with ndhJ annotated as a functional gene (Silva et al. 2021). Carnegiea gigantea, also part of Cactoideae, lost nine of the 11 subunits of the ndh complex, with the remaining two subunits being pseudogenized. In addition, strong evidence has supported the ndhF-subunit translocation to the nucleus (Sanderson et al. 2015). This complex is responsible for chlororespiration (Peltier and Cournac 2002) and mainly for the transfer of electrons from NADH to plastoquinone, protecting the cell against photooxidation-related stress while maintaining cyclic rates of photophosphorylation (Martín and Sabater 2010). In heterotrophic plants, the loss of the ndh complex has been indicated as the first step towards plastome size decrease in these groups (Barret and Davis 2012; Barret et al. 2014). Experiments in Marchantia polymorpha (Marchantiaceae) showed that ndhB encoded by plastids are not essential for plant survival (Ueda et al. 2012). In the Orchidaceae, the ndh complex was partially or completely lost (Kim et al. 2015) and translocation to mitochondrial DNA has been well documented (Lin et al. 2015). As with orchids, there are many examples of substantial losses of the ndh complex in Cactoideae. Since the chloroplast organelle is responsible for vital processes like photosynthesis, it has been argued that gene loss can impair the efficiency of some metabolic pathways, plant growth, and cell survival in autotrophic plants (Neuhaus and Emes 2000; Kode et al. 2005; Rogalski et al. 2006; Romani et al. 2012). In many cases, the absence of these genes is compensated by the existence of a copy transferred from chloroplast to the nuclear genome, as demonstrated in legumes (Gantt et al. 1991). However, as we did not address possible translocations from plastid to mitochondrial or nuclear DNA, events of this nature in Discocactus and Melocactus await future investigations.

The rpl23 gene is absent and rpl2, rpl16, rpl20, and rpl33 were pseudogenized in D. bahiensis and M. ernestii plastomes. Evidence of pseudogenization and loss of subunits of the rpl complex has also been found in C. gigantea (Sanderson et al. 2015) and in seven species of Mammillaria (Solórzano et al. 2019). However, it is present in P. oleraceae (Liu et al. 2017) and O. quimilo (Cactaceae, Köhler et al. 2020). In addition, there is a strong indication that the rpl23 gene is a pseudogene in the entire Caryophyllales supported by the independent changes of this gene across ten clades (Yao et al. 2019). The rpl complex is responsible for encoding proteins from nine large ribosomal subunits (Wicke et al. 2011). Other examples of loss of rpl subunits were observed for Castanea and Quercus (Fagaceae), with the rpl22 subunit translocated to the nucleus, as well as in Passiflora (Jansen et al. 2010).

Genes of ribosomal proteins (SSU) (rps12, rps16, and rps18), miscellaneous proteins (accD, cemA, and clpP fragment), hypothetical proteins and conserved reading frame sets (ycf1 fragment, ycf3, and ycf3 fragment) were pseudogenized in D. bahiensis and M. ernestii, whereas genes ycf2 and ycf68 were pseudogenized only in M. ernestii. The genes reported above were categorized as pseudogenes for at least one species of Cactoideae except for cemA and ycf3, while the ycf1 and ycf3 fragments were absent (Sanderson et al. 2015; Solórzano et al. 2019). The ycf1 and the ycf3 gene fragments were added to this category because only a fragment of each gene was present in the IR. Pseudogenization of the cemA gene was reported for the first time for Caryophyllales based on the extensive study conducted by Yao et al. (2019).

Besides complete loss of intron-bearing genes, such as ndhA, trnA-UGC, and trnV-UAC, D. bahiensis and M. ernestii lost the introns of clpP, petB, petD, and rpl16 genes. The loss of the intron of the clpP gene was extensively documented by Yao et al. (2019), occurring independently in four different families of Caryophyllales, represented by Anacampseros filamentosa (Anacampserotaceae), Portulaca grandiflora (Portulacaceae), Silene chalcedonica (Caryophyllaceae), and the clade formed by Hypertelis spergulacea and Pharnaceum aurantium (Molluginaceae). In this study, we report for the first time the loss of this intron for a fifth family (Cactaceae), as well as the introns in the petB, petD, and rpl16 genes for Caryophyllales. Consequences of the loss of introns have not yet been clarified (Jin et al. 2020). Notwithstanding, D. bahiensis and M. ernestii developed an atypical intron in the aacD gene, a non-intron-bearing gene resulting from the insertion of 217 and 753 bp, respectively.

Providing potential polymorphic regions for phylogenetic studies in Melocactus and Discocactus –

Unsuccessful attempts to resolve the molecular phylogenetic relationships of Melocactus (Taylor et al. 2014) agree with its recent diversification (Silva et al. 2017) and expected low nucleotide diversity. The advent of NGS sequencing allowed to assemble plastomes of different groups, helping phylogenetic analyses at various taxonomic levels (Guo et al. 2017; Ji et al. 2019). It also enables to propose regions with higher rates of nucleotide substitution for future infrageneric studies by Sanger sequencing. Hence, we aligned D. bahiensis and M. ernestii plastomes and identified five promising regions (trnF-D, ycf1_1, ycf1_2, rps11-rps8, and ycf1 fragment-ndhB).

The new regions proposed showed higher polymorphism index (pi) for D. bahiensis and M. ernestii when compared to conventional plastid regions (trnL-F, trnK/matK, and rpl16). The trnF-D and ycf1 fragment-ndhB intergenic regions were generated by rearrangements in D. bahiensis and M. ernestii plastomes, not being suitable for the phylogenetically distant Cactoideae. However, ycf1_1, ycf1_2, and rps11-rps8 may constitute new regions for understanding phylogenetic relationships in Cactoideae. Nevertheless, the primers we present here do not seem to amplify in C. gigantea, Mammillaria, nor Rhipsalis species, because they have substitutions in those regions. Thus, our results indicate that ycf1_1, ycf1_2, and rps11-rps8 regions may be useful for phylogenetic studies within the Cactoideae subfamily with new primers, while the trnF-D and the ycf1 fragment-ndhB regions could be more relevant for discussing the relationships within Discocactus, Melocactus and closely related genera.

In conclusion, we present here the first de novo assembled plastome of the genus Discocactus and Melocactus. Compared to other plastomes from the subfamily Cactoideae, it revealed a highly rearranged structure, showing the largest inverted region (IR), numerous gene and intron losses, and pseudogenizations, mainly in the ndh complex. Neither IR loss nor a high number of repeats explained the rearrangements that occurred within Cactoideae. Finally, we propose new primers for more polymorphic plastid regions that will be suitable for intra- and intergeneric phylogenetic analyses.