Introduction

Coffee is one of the most important beverages worldwide, with more than 2.25 billion cups consumed daily (Denoeud et al. 2014). More than 75% of traded coffee is produced by Coffea arabica L. and 35% by Coffea canephora Pierre ex A. Froehner. Coffea L. (Rubiaceae) genus has a recent origin (7.87 Mya, Tosh et al. 2013) and includes more than 124 species native to Africa, Australasia, Comoros, India, Madagascar, Mascarenes and Papua New Guinea. Almost all diploid (2n = 2x = 22) species are self-incompatible, with few exceptions, such as C. anthonyi Stoff. & F. Anthony. C. arabica is the only polyploid in the genus, with 2n = 4x = 44 chromosomes, and is self-compatible (Hamon et al. 2017).

Cytogenetics concerns genomic and epigenomic aspects of the chromosomes, including the number, structure, organization, function, evolution, and behavior during the cell cycle and meiosis (Singh 2016; Deakin et al. 2019). Coffea cytogenetics was initiated in the early twentieth century with the work of von Faber (1912). As in other plant species, limitations have hampered the cytogenetic investigations in Coffea. The main problem has been to obtain mitotic and/or meiotic chromosomes, i.e., the biological material to conduct cytogenetic studies. Even today, one of the main challenges is to seek alternatives to obtain Coffea chromosomes, mainly to cytogenomic experiments involving in situ hybridization applications. In addition, the small size of the chromosomes and low metaphasic index from root meristems also hampered Coffea cytogenetics. The further advance of protocols allowed the progress in Coffea chromosome complement research, such as the analysis of pachytene chromosomes (Pinto-Maglio and Cruz 1987), improved methods for mitotic chromosome preparation and the replacement of root meristems by cell aggregate suspensions (CAS, Clarindo and Carvalho 2006). From this, karyotype has been expanded and refined, contributing to a better understanding of the Coffea genome. So, Coffea cytogenetics is important for taxonomic and evolutionary studies, DNA sequence mapping, functional genome annotation (Hamon et al. 2009; Yuyama et al. 2012) and coffee breeding programs (Clarindo and Carvalho 2009). In this review, we revisited all cytogenetic Coffea studies to report the advances and contributions in this genomic era, as well as the main challenges and perspectives for further studies. In addition, we report new data about the 2n karyotype of the diploid C. eugenioides and the polyploid “Híbrido de Timor '' (HT), expanding the genomic data about Coffea and its diversification and evolution.

Initial steps—chromosome number determination

The first chromosome counting in a Coffea from micro- and megasporogenesis evidenced that C. arabica shows 2n = 16 chromosomes (von Faber 1912). Twenty years later, Homeyer (1932) counted 2n = 22 chromosomes for the same species. Based on chromosome numbers of other Rubiaceae genera, such as Sherardia L., Crucianella L., Asperula L. and Galium L., Homeyer suggested the basic chromosome number of x = 11 for Coffea. Most striking breakthroughs in Coffea cytogenetics were achieved in the 1930s, mainly at the Instituto Agronômico de Campinas, Brazil, and by French and Belgian researchers. Krug (1934) reported 2n = 44 chromosomes for five C. arabica varieties (‘Nacional’, ‘Bourbon’, ‘Laurina’, ‘Maragogipe’ and ‘Amarelo de Botucatu’) and 2n = 22 for C. canephora, Coffea liberica Hiern and Coffea congensis A. Froehner, confirming the basic chromosome number of x = 11. Therefore, from the 2n chromosome number, C. arabica was noticed as a tetraploid species of Coffea (Krug 1934) and started to stand out not only for its economic relevance but also for the genomic events involved in its karyotype evolution. Despite being a tetraploid, C. arabica demonstrates a diploid meiotic behavior, forming only bivalents from prophase I until metaphase I. Although Krug did not carry out a detailed morphological analysis, he described the Coffea chromosomes as small (approximately 1–2 µm) and homomorphic. Due to these karyotype features, some cytogenetic applications have been applied to characterize the chromosomes and to understand the evolution of the Coffea genome. In the following years, the chromosome number of many other species, varieties and hybrids of Coffea was determined as 2n = 22 or 2n = 44 (Krug 1937; Mendes 1938; Bouharmont 1959, 1963; Sybenga 1960; Conagin and Mendes 1961).

Karyotype morphology and the first karyograms

Coffea chromosomes have been considered a hindrance for karyogram assembly due to their small size and high similar morphology. Mendes (1938) was the first to report a morphological characterization of Coffea chromosomes, describing solely those of Coffea liberica ‘Dewevrei’ (De Wild. & T. Durand) Lebrun (syn. Coffea excelsa Chev.). Although C. liberica ‘Dewevrei’ chromosomes were larger compared to those of other Coffea species, they were also considered small and morphologically, ranging from 3.5 (chromosome 1) to 1.5 µm (chromosome 11). Only three classes of chromosomes were distinguishable based on their total length, being designated as A, B and C. Class A was composed of three chromosome pairs with 2–3.5 µm, class B of four pairs with around 2 µm, and class C of four pairs with 1–2 µm (Mendes 1938). Bouharmont (1959, 1963) determined the number and length of the mitotic chromosomes of sixteen Coffea species, characterizing them as small and homomorphic and highlighting the hypothesis that the Coffea basic chromosome number is x = 11. Owing to the high intra- and interspecific similarity among chromosomes, the author was only able to distinguish five (1, 2, 3, 4 and 11) of the eleven chromosomes.

Pachytene chromosomes allowed to overcome the hindrance associated with the small size of mitotic metaphase chromosomes. Pachytene chromosome characterization bearing the NOR provided insights on the C. arabica polyploid origin, by comparing the bivalent morphology with those of diploid species. One C. arabica NOR bivalent was similar to the Coffea eugenioides S. Moore and C. liberica ‘Dewevrei’ with regard to total and arm lengths. The other C. arabica NOR bivalent showed a simpler chromomeric pattern, most similar to the bivalents II of Coffea salvatrix Swynn. & Philipson and Coffea racemosa Lour (Pinto-Maglio and Cruz 1987).

C. arabica 22 pachytene chromosomes were after characterized, being four metacentric (1, 8, 11 and 18), fourteen submetacentric (2, 3, 4, 5, 6, 7, 9, 10, 12, 15, 16, 17, 19 and 20) and four acrocentric (13, 14, 21 and 22) chromosomes. Three bivalents presented the NOR (14, 20 and 21). In general, the chromomeres were located near the centromere, and similarities concerning the chromomeric pattern were shown among 54% of the 22 bivalents of C. arabica, evidencing some level of homoeology between them. Based on these cytogenetic markers and by gathering information from the bibliography, the authors suggested that such similarities among C. arabica bivalents are consistent with a segmental allopolyploid origin (Pinto-Maglio and Cruz 1998). So, the mitotic cytogenetics showed the ploidy level of the Coffea species, mainly the tetraploidy of C. arabica, and the meiotic cytogenetics indicated the genomic origin of this species. Unfortunately, new data has not been published about the Coffea meiotic chromosome.

Besides the small size and homogeneous morphology of Coffea chromosomes, the low seed germination rate and low metaphase index have also been pointed out as hindrances for cytogenetics (Conagin and Mendes 1961; Iacia and Pinto-Maglio 2013). To overcome this barrier, our research group (Clarindo and Carvalho 2006, 2008, 2009; Clarindo et al. 2012) replaced the root meristems by in vitro CAS as material source (Box 1, Fig. 1). Thereby, high metaphase indexes for C. canephora, C. congensis, C. eugenioides and C. arabica have been achieved, allowing to obtain chromosomes with total lengths of up to 5 µm (5.30 µm for chromosome 1 of C. arabica, for instance, Clarindo and Carvalho 2008). Such chromosomes were suitable for the assembly of the first karyograms at distinct levels of chromatin condensation, as well as the application of banding procedures. C. canephora and C. congensis possess 2n = 2x = 22 chromosomes, with two metacentric (4 and 9) and nine submetacentric pairs (1, 2, 3, 5, 6, 7, 8, 10 and 11). The submetacentric chromosome pair 6 of both species shows a secondary constriction (SC) in the short arm, which was confirmed for C. canephora by Ag-NOR and Hsc-FA. C. arabica has 2n = 4x = 44 chromosomes, being 5 pairs classified as metacentric (7, 8, 13, 14 and 20), 16 as submetacentric (1–6, 9–12, 15–19 and 21) and one pair as acrocentric (22).

Fig. 1
figure 1

Schematic representation of a workflow to establish Coffea CAS cultures for obtaining prometaphasic/metaphasic chromosomes suitable multiple cytogenomic analysis. More details are presented in the text from Box 1

“Híbrido de Timor '' (HT), a natural hybrid between C. canephora and C. arabica, had its karyotype characterized and karyogram assembled by our adapted protocol. The semi-fertile HT ‘CIFC 4106’, a vegetatively propagated accession derived from the original HT plant, shows 2n = 3x = 33 chromosomes. Therefore, it was considered an allotriploid (Clarindo et al. 2013). We classified the chromosomes, assembled the HT ‘CIFC 4106’ karyogram and showed it for the first time now. Due to its anorthoploid condition (odd number of chromosome complements), the karyotype exhibits not only chromosome pairs, but also individual ones and groups of three or four chromosomes (Fig. 2). Some chromosomes appear to be more similar to those of C. canephora or C. arabica, providing new cytogenetic information regarding the allopolyploid origin of HT. Such evolutionary aspects of the Coffea genome and interspecific hybridizations throughout the history of the genus will be discussed in the following topics.

Fig. 2
figure 2

HT ‘CIFC karyogram 4106’ evidencing individual and grouped chromosomes, which were defined according to the total length and classification. 19 chromosomes (1, 4–9, 12–14, 19, 22–24, 27, 30–33) showed at least one particular cytogenetic feature. The paired chromosomes were 2–3, 10–11, 20–21, 25–26, and 28–29; and the chromosomes grouped in four were 15–18. Bar = 5 μm

Chromosome banding and the onset of Coffea cytogenomics

C- and NOR-banding were initially applied in Coffea, mapping the constitutive heterochromatin and NOR, respectively. C-bands in Coffea occur preferentially at pericentromeric/centromeric regions. Corroborating to karyotype characterization, C-banding shows the karyotype asymmetry of the genus, varying from 62.3% (C. eugenioides) and 64.32% (C. liberica), with the prevalence of submetacentric chromosomes. On the other hand, the number of active NOR sites vary, with C. kapakata, C. congensis, C. eugenioides, C. liberica, C. canephora and C. liberica var. dewevrei displaying one pair with a positive NOR band (presumably number 3), and C. racemosa, C. salvatrix and C. stenophylla with two positive bands in chromosomes 1 and 3 (Pierozzi et al. 1999, 2012; Pierozzi 2013).

4′-6-diamidino-2-phenylindole (DAPI+) and Chromomycin A3 (CMA3+) differential staining have been carried out in Coffea species to map the AT- and GC-rich regions, respectively, contributing to identify specific karyotype signals in several species, even those with genetic or botanical status still under debate. DAPI+ bands were only found in C. stenophylla among 12 analyzed Coffea species (Pinto-Maglio et al. 2000, 2001; Barbosa et al. 2001; Lombello and Pinto-Maglio 2004a, b, c). After DNA denaturation/renaturation in a fluorescent in situ hybridization (FISH), specific interstitial DAPI+ bands in nine Coffea species were detected. CMA3+ bands were found in all Coffea species studied so far. In general, the strongest CMA3+ fluorescence signals are associated with the SC and co-localized with the ribosomal DNA (rDNA) loci detected by FISH. Additional, CMA3+ bands may also occur adjacent to, or interspersed with, the interstitial 5S rDNA (Hamon et al. 2009).

Karyotype knowledge obtained from classical cytogenetics has been the basis for queries, which have been investigated mainly by cytogenomics, allowing a more refined Coffea genome characterization. Cytogenomics made possible to: (1) discover the diploid progenitors of C. arabica (Raina et al. 1998; Lashermes et al. 1999): (2) obtain information for application in breeding programs (Herrera et al. 2007); and (3) obtain a larger amount of details on the organization and evolution of Coffea genomes (Hamon et al. 2009). The identity of C. arabica diploid progenitors has been in the spotlight of Coffea research since the discovery of its tetraploid condition by Krug (1934). Currently, wild C. arabica populations are found mainly in the Afromontane rainforests of southwest Ethiopia and the Boma Plateau of Sudan (Bawin et al. 2020). Although the geographic range of C. arabica does not overlap with that of any other Coffea species, the closeness with wild populations of C. eugenioides and of the Canephoroid species C. canephora, C. congensis and C. brevipes (Benth.) H.S. Irwin & Barneby, made these the most likely diploid progenitors. The first phylogenetic inferences based on chloroplast genes and rDNA confirmed the high genetic proximity among these species (Lashermes et al. 1995).

From genomic DNA probes (genomic in situ hybridization—GISH) of four diploid Coffea species, 22 C. arabica chromosomes hybridized preferentially with C. eugenioides probes, while the other 22 chromosomes hybridized more strongly with the C. congensis probes (Raina et al. 1998). Also using GISH, Lashermes et al. (1999) suggested that C. canephora and C. eugenioides, or their related ecotypes, are most likely the two diploid progenitors of the tetraploid C. arabica. In a phylogenomic approach using genotyping-by-sequencing, the genetic distances were estimated between C. arabica and 23 other species, including all those known as the most closely related to C. arabica, as C. eugenioides, C. canephora, C. congensis and C. brevipes. C. eugenioides and C. canephora were confirmed as the putative female and male progenitor species, respectively, that hybridized between 1.08 million and 543 thousand years ago (Bawin et al. 2020).

Integrating the DAPI+, CMA3+ and rDNA sites mapping, two to five chromosome pairs were discriminated from the karyotypes of 16 Coffea species, including the allotetraploid C. arabica. However, the results were not suitable to discriminate between C. canephora and C. congensis as putative progenitors. Differences of the number of 18S and 5S rDNA loci were revealed and related to the biogeographical region of the 16 analyzed species. Most of the East African species, as C. eugenioides, C. salvatrix and C. racemosa, possessed two chromosome pairs with the 18S rDNA locus and one pair with the 5S rDNA, while the majority of the West and Central African species exhibited one chromosome pair with the 18S and two with the 5S rDNA (Hamon et al. 2009).

Alien chromatin in interspecific hybrids between C. arabica and C. canephora, as well as in an introgressed line derived from C. arabica and C. liberica crossing, was identified from GISH and BAC-FISH (fluorescence in situ hybridization using bacterial artificial chromosomes). GISH results from the interspecific hybrids revealed close affinity between C. arabica and C. canephora genomes, evidencing that a low rate of structural modifications has occurred in both genomes since C. arabica speciation (Bawin et al. 2020). In addition, GISH/BAC-FISH in the introgressed line karyotype detected and physically located the C. liberica-introgressed DNA sequences carrying the SH3 factor for resistance against Hemileia vastatrix Berk. & Broome (Herrera et al. 2007).

Coffea genomics and its integration with cytogenetics

During the 1990’s, the availability of DNA based molecular markers allowed a rapid progress in coffee genomics. As for other crops, early genomic studies in Coffea were mainly focused on assessing the genetic diversity and phylogenetic relationships, constructing genetic maps and identifying quantitative trait loci (QTLs) (reviewed by Lashermes et al. 2008; de Kochko et al. 2010). Genetic diversity was investigated from several molecular markers in wild Coffea species and, with a larger effort, in C. arabica and C. canephora. The considerably narrow genetic basis of both wild and cultivated C. arabica populations was the most relevant feature noticed by these studies (Vossen 1985; Lashermes et al. 1996; Scalabrin et al. 2020). This bottleneck is mainly explained by the autogamous reproduction system, with a rate of outcrossing around 10%, and the founder effect resulted from the small number of individuals introduced to America in the first commercial plantations (Carvalho and Krug 1949; Setotaw et al. 2013). In addition, C. arabica origin from a single allopolyploidization event also contributed to the low genetic diversity observed for the species (Lashermes et al. 2014). Such low genetic variability is a challenge for the identification and selection of superior genotypes. For this, molecular markers have played a fundamental role as to discriminate between genotypes (Sousa et al. 2017). Due to low genetic variability, breeding programs devoted efforts on the introgression of interesting genes from different species and hybrids, including C. canephora (Lashermes et al. 2011), C. liberica (Prakash et al. 2004) and HT (Setotaw et al. 2020). As the genomes of Coffea species exhibit considerable similarities, interspecific crossing (hybridization) is possible and often used for gene introgression (Charrier and Berthaud 1985; Anthony et al. 2011).

For C. canephora, the first genetic maps were constructed as soon as molecular markers became available for Coffea (reviewed by de Kochko et al. 2010). In 2014, the most complete genome sequence of C. canephora (accession number: PRJEB4211) was published along with a high-density genetic map constructed using several molecular markers and 3,230 loci distributed on 11 linkage groups, the same basic chromosome number of this species (x = 11). C. canephora genetic map was integrated with the sequenced genome, which covered 80% of the 710 Mbp total genome. There was a considerable variation in the physical:genetic map distances, with crossing overs occurring with higher frequencies in regions with lower density of repeats (Denoeud et al. 2014). The complexity of the C. arabica allotetraploid genome and its low genetic diversity were the main challenges to construct a high-density linkage map. Notwithstanding, a high-density linkage map was published for this species showing 22 linkage groups with 848 markers. This genetic map was also successfully used to identify QTLs associated with coffee yield, plant height, and bean size (Moncada et al. 2016).

As for the construction of genetic maps, the sequencing of a polyploid genome is also challenging, since the presence of homoeologous chromosomes hampers the assembly of each haploid complement. C. arabica draft genome (public accession number: PRJNA554647) was released in 2020, with the two component subgenomes independently assembled. In accordance with previous studies reported here, a low genetic diversity was confirmed, which was likely caused by a severe bottleneck resulting from a single event of polyploidization at the origin of the species C. arabica (Scalabrin et al. 2020). Moreover, two other genomic sequences have also been reported for C. arabica (NCBI: GCA_003713225.1 and Phytozome: genome ID 453). In addition to C. canephora and C. acabica, the C. eugenioides (accession number: PRJNA497891) and Coffea humblotiana Baill. (accession number: PRJNA665152, Raharimalala et al. 2021) genomes are also publicly available. The species C. humblotiana is the sole species from the Coffea genus endemic to the Comoros archipelago and that was probably cultivated and consumed in the past. The most noteworthy feature of C. humblotiana is the complete absence of caffeine in seeds and leaves, which is explained by the loss of one of the Caffeine Synthase (DXMT) genes, which converts theobromine into caffeine, most likely through illegitimate recombination (Raharimalala et al. 2021). Despite the significant difference in genome size, there is a high degree of synteny between the genomes of C. humblotiana (2C = 0.97 pg, Razafinarivo et al. 2012) and C. canephora (2C = 1.43 pg, Clarindo et al. 2012) with 32.16% more nuclear DNA content.

Recent advances in next-generation sequencing technologies and bioinformatic tools provided a growing amount of available genomic data, laying the background for the integration with cytogenetics and giving rise to the cytogenomics era (Talukdar and Sinjushin 2015). The integration of the linkage and cytogenetic maps and sequencing data is fundamental to define genome regions that are not yet well characterized (Kim et al. 2005). Because linkage map distances are based on recombination rates and not simply related to physical distances, physical mapping is also needed to confirm the locations of DNA sequences (Koo et al. 2008). Physical mapping might be obtained through genome sequencing or in situ localization using cytogenomics. The major challenge related to sequencing is the resolution of complex repetitive sequences. As the DNA must be fragmented into small contigs, usually around 100 bp, repeats create computational ambiguities during alignment and assembly, which might produce biases and errors when interpreting results (Schatz et al. 2012; Treangen and Salzberg 2012). Cytogenomics, on the other hand, is efficient to map repetitive sequences, revealing the physical localization in situ on the mitotic or meiotic chromosomes (Larracuente and Ferree 2015). Therefore, the linkage and cytogenetic maps and the sequencing are complementary, and integrating these data is fundamental for a more detailed and accurate Coffea genome knowdelege.

Completely Coffea sequenced genome released in 2014 for C. canephora showed that mobile elements represent more than 50% of its genome, among which ~ 85% belong to the LTR-retrotransposon class (Denoeud et al. 2014). Due to their high frequency (~ 15% to > 70%) and pivotal roles in plant genome organization, function and evolution, the analysis of mobile elements sequence (for example, the genes and repetitive sequences) and distribution along the genome has been performed in several plant species (Civáň et al. 2011; Wicker et al. 2018). Mobile elements comprise DNA sequences with the ability to insert themselves (transposons) or new copies of themselves (retrotransposons, RTEs) into new locations within a genome (Civáň et al. 2011). For the Coffea sequenced genomes analyzed so far, the proportions of LTR-retrotransposons, for instance, varied from 32% for C. humblotiana (1C = 0.49 pg, Razafinarivo et al. 2012) to 53% for Coffea heterocalyx Stoff. (1C = 0.87 pg, Noirot et al. 2003). The variation in the abundance and types of different mobile elements can reflect the divergence of botanical groups and also the evolution of species within these botanical groups (Guyot et al. 2016). From the genomic data provided by sequencing technologies, DNA sequence probes have been constructed for cytogenetic mapping of mobile elements. From this, the comparative cytogenomic analysis of the distribution patterns among different species can be performed. Therefore, this integration between genomics and cytogenetics provides valuable information to understand the karyotype evolution in Coffea.

Some mobile elements have been mapped in Coffea species, including DNA transposons (Lopes et al. 2013), Long Terminal Repeat (LTR) retrotransposons (Yuyama et al. 2012; Herrera et al. 2013; Lopes et al. 2013) and centromeric retrotransposons (Nunes et al. 2018). Nonetheless, the low longitudinal resolution of chromosomes did not allow the precise mapping, but an overview of distribution patterns along the genomes. DNA transposons MuDR and Tip100, for instance, exhibit a preferential clustering in terminal positions of C. canephora and C. eugenioides chromosomes, while C. arabica showed larger numbers of interstitial signals. This distribution indicates an increased transposition activity in the allotetraploid (Lopes et al. 2013), which is consistent with the well-known hypothesis that polyploidization can induce a burst in mobile element activity (Vicient and Casacuberta 2017).

The detailed analysis of the centromeric mobile elements composition of C. arabica, C. canephora and C. eugenioides revealed a considerable diversity in centromeric retrotransposons of Coffea (CRC) from the Ty3/Gypsy superfamily, which were divided in ten groups according to the sequence and similarity of the Reverse Transcriptase domain. Generalist probes for these CRCs exhibited a variable fluorescence signal pattern among species and among chromosomes of the same species. While in C. eugenioides the signals were identified only to centromeric regions, in C. canephora and C. arabica the signals appeared slightly scattered along interstitial regions and less specific to centromeres. In addition, C. arabica presented two pairs without bright signals, which might be homologous to those chromosomes without signals from the parental diploids C. canephora and C. eugenioides (one pair each) (Nunes et al. 2018).

Allopolyploidy in Coffea

Coffea genus has a recent monophyletic origin around 5–25 Mya, and none event of whole genome duplication (WGD) occurred immediately prior to or after the irradiation of the Rubiaceae family (Orozco-Castillo et al. 1996; Wu et al. 2006; Mahé et al. 2007; Cenci et al. 2010). Therefore, the common ancestral shared by all Coffea species probably had the basic chromosome number of x = 11. In addition, the diversification and speciation of each Coffea species occurred by DNA sequence changes (mutations) and small chromosomal rearrangements, which have been progressively identified and characterized through classical cytogenetics and cytogenomics (Rijo 1974; Yu et al. 2011; Denoeud et al. 2014; Raharimalala et al. 2021). A micro-collinearity analysis between orthologous BACs of C. canephora and C. arabica, for instance, evidenced a high level of sequence similarity, but numerous small chromosomal rearrangements, including inversions, deletions and insertions (Yu et al. 2011; Cenci et al. 2012).

Allotetraploidy and diversification of C. arabica

Krug’s (1934) discovery the tetraploidy of C. arabica ignited a long debate on the origin (ancestry) and evolution (“omics” changes) of this species. As mentioned here, there was an agreement that C. arabica originated from hybridization between two diploid species with similar genomes, with the potential progenitors being C. eugenioides (Berthou 1983; Lopes et al. 1984; Orozco-Castillo et al. 1996; Raina et al. 1998; Lashermes et al. 1999; Ruas et al. 2003), C. canephora (Lashermes et al. 1997, 1999; Ruas et al. 2003; Clarindo and Carvalho 2009), C. congensis (Höfling and Oliveira 1981; Lashermes et al. 1997; Raina et al. 1998) and C. brevipes (Lashermes et al. 1997). Therefore, the focus has been the C. arabica ancestors. However, further studies should be accomplished to unravel the genomic and epigenomic outcomes of the C. arabica allopolyploid condition.

C. arabica polyploid origin from a crossing between two diploids with similar genomes is a consensus, but its genomic origin classification has been discussed. Initially, C. arabica was classified as a natural segmental allotetraploid (Orozco-Castillo et al. 1996; Pinto-Maglio and Cruz 1998). ‘Segmental allopolyploid’ in this context was based on the Stebbins (1949) definition as a type of allopolyploid that contain two partially differentiated genomes, and that was originated from hybridization between species close enough to allow the partial pairing between homoeologous chromosomes. Thus, segmental allopolyploids are intermediaries between autopolyploids and true (or genomic) allopolyploids, as the differentiation between the progenitor genomes is insufficient for complete allopolyploidy. Exceptionally, the allotetraploid C. arabica exhibits a stable diploid-like meiotic behavior (Mendelian segregation). This might be possible owing to the genetic system occurrence wherein the pairing between homoeologous chromosomes is avoided (Pinto-Maglio and Cruz 1998), such as the Ph gene (homoeologous pairing suppressor) found in Triticum L., Avena L., Festuca L., Gossypium L., Nicotiana L. and Lolium L. Further molecular evidence on this hypothesis suggested that the absence of homoeologous pairing in C. arabica is not a consequence of structural differentiation between the two parental genomes, but rather the effect of one or several pair-regulating genes, which could be similar to Ph (Lashermes et al. 2000).

C. arabica was also hypothesized as an amphidiploid species formed from the crossing between C. eugenioides as female progenitor and C. canephora, or its related ecotypes, as male progenitor. In addition, C. arabica origin is recent due to the low level of divergence between the two constitutive genomes of this species and the related parental genomes (Lashermes et al. 1999). Amphidiploid refers to segmental allotetraploids that have gone through an event of WGD after hybridization (Stebbins 1949). In the absence of homoeologous pairing, the WGD event may restore de fertility of the hybrid (homoploid), since the presence of two copies of each genome would enable the Mendelian pairing (bivalent) during meiosis I. The WGD event that gave rise to the fertile allotetraploid ancestors of C. arabica possibly involved either chromosome set doubling in a diploid interspecific hybrid or backcrossing of a spontaneous triploid (Lashermes et al. 1999).

The true allotetraploid nature of C. arabica was reinforced from classical cytogenetics and chromosomal image cytometry (Box 2) analyses (Clarindo and Carvalho 2008, 2009), corroborating with data based on molecular markers and GISH. The comparison with the karyotypes of two potential genitors, C. canephora and C. congensis, revealed the presence of identical chromosomes between C. arabica and both species concomitantly. In addition, C. arabica also exhibits a small acrocentric chromosome pair that is not present in either of these two species. Taken together these results also support the idea that only one of them participated in the origin of C. arabica (Clarindo and Carvalho 2009; Clarindo et al. 2012).

Cytogenetic evidence concerning the C. arabica origin was obtained from the SC/NOR chromosome. Three pachytene chromosomes of C. arabica (14, 20 and 21) have SC (Pinto-Maglio and Cruz 1998), which were confirmed by 18S rDNA signals in mitotic metaphases (Hamon et al. 2009). C. canephora and C. congensis exhibit a single SC in the short arm of the chromosome 6 (Clarindo et al. 2012), also confirmed by 18S rDNA (Hamon et al. 2009). Therefore, one of the SC/NOR chromosomes in C. arabica karyotype was probably derived from the canephoroid genitor (most likely C. canephora), while the other two were possibly inherited from C. eugenioides.

From orthologous coding sequence divergence analysis, C. arabica evolutionary history was also evidenced as an allotetraploid originated from a natural hybridization between C. canephora and C. eugenioides. Thus, the genome of C. arabica was represented by the genome formula CaCaEaEa (Lashermes et al. 1999; Yu et al. 2011). The most accurate estimation for the genomic origin time of the natural hybridization is in between 0.54 and 1.08 Mya (Bawin et al. 2020). Although the current geographical range of C. arabica do not overlap with any other Coffea species, including its possible progenitors, pollen records and lake sediment cores from the Congo basin and East Africa indicate that the Afromontane rainforests regularly expanded to lower altitudes during glacial periods between 0.60 and 1.05 Mya. Corroborating, C. canephora and C. eugenioides occurred in contact zone in this area. The changing environmental conditions might have also played a role on this speciation event, weakening interspecific reproductive barriers between C. arabica diploid parental species. In addition, the aridification of East Africa since 0.575 Mya is one of the possible explanations for the current non-overlapped distribution areas of C. arabica, C. eugenioides and C. canephora (Owen et al. 2018; Bawin et al. 2020).

C. canephora and C. arabica ‘Tall Mokka’ comparison from BAC clones (~ 140–160 kb), bearing the aforementioned orthologous coding sequences, aimed to show the outcomes of the C. arabica allopolyploidy from a cytogenomic perspective. Despite the high degree of sequence conservation in coding regions, genomic differences were found. Major chromosomal rearrangements were observed in the intergenic regions of these BACs, including a paracentric inversion between homoeologous regions within C. arabica (Ca and Ea). As C. eugenioides was not included in this study, it was not possible to distinguish if the chromosomal inversion occurred in C. arabica or was inherited from one of the two diploid progenitors. Therefore, the inversion might not be a consequence of the C. arabica allopolyploidy. Moreover, the specific insertion of a Ty1-copia retrotransposon in the Ca sub-genome of C. arabica was also reported, which might be related to the burst in TE activity (Yu et al. 2011).

Allotriploidy of the Timor hybrid

In addition to the polyploidization event that gave rise to the well-established allotetraploid species C. arabica, another recent allopolyploidy event in the Coffea genus originated the hybrid named HT, or the Timor hybrid. HT arose in a plantation of C. arabica ‘Typica’, established around 1917/18 on the Timor Island. All the accessions of this hybrid originated from this single plant, or from backcrosses between this plant and its progenitor C. arabica (Gonçalves et al. 1978). Besides the remarkable relevance of HT as a source of resistance genes in coffee breeding, this hybrid has also gained attention for its very recent allopolyploid origin of ~ 100 years (Gonçalves et al. 1978; Capucho et al. 2009). HT ‘CIFC 4106’, which has been vegetatively propagated, is a triploid hybrid with a chromosome number 2n = 3x = 33 and a nuclear DNA content of 1C = 2.10 pg. Therefore, we considered that this accession represents the first plant of HT found in the Timor Island.

In summary, the hybridization/polyploidization events involving C. canephora, C. eugenioides, HT and C. arabica might be explained as follows. The allotetraploid C. arabica originated around 0.543 and 1.08 Mya (Bawin et al. 2020) from the fusion between a reduced reproductive cell from C. canephora (n = x = 11) and another from C. eugenioides (n = x = 11, Lashermes et al. 1999). After a polyploidization event in the sterile homoploid, fertility would be restored, resulting in the fertile allotetraploid with 2x = 4x = 44 chromosomes, being 22 from C. canephora and 22 from C. eugenioides (Ca and Ea subgenomes, respectively). A natural backcross dating from ~ 100 years ago occurred between C. arabica and its progenitor C. canephora, involving the fusion of reduced reproductive cells from both species and resulting in the HT with 2x = 3x = 33 chromosomes. Therefore, our research group recently launched the hypothesis that the genome of HT ‘CIFC 4106’ is represented by the formula CCaEa. To provide more information regarding the karyotypic evolution of C. eugenioides, C. canephora, C. arabica and HT ‘CIFC 4106’, we have been combining classical and molecular cytogenetics, as well as flow cytometry for nuclear DNA content measurements (Box 2).

The six chromosome pairs presented by HT 'CIFC 4106' (2–3, 10–11, 12–13, 20–21, 25–26, 28–29) might represent the CCa subgenomes, being C from C. canephora and Ca from C. arabica (Figs. 2, 3). A look at the HT ‘CIFC 4106’ karyogram shows that the possible pairs 2–3, 10–11, 12–13, 20–21, 25–26 and 28–29 may represent the chromosomes 1, 4, 5, 7, 9 and 11 of C. canephora, respectively (Fig. 3). HT ‘CIFC 4106’ group 15–18 is similar to pair 6 and 7 of C. canephora (Fig. 3). Another evidence that supports the CCaEa genome hypothesis, specifically regarding the E genome, is that chromosomes 1, 4 and 9 of HT ‘CIFC 4106’ are similar to 1, 2 and 3 of C. eugenioides, respectively, considering morphometry. In addition, chromosomes 7 and 11 of C. eugenioides are similar to 19 and 33 of HT ‘CIFC 4106’, respectively (Fig. 3).

Fig. 3
figure 3

Karyograms obtained from metaphase chromosomes of: a C. eugenioides exhibiting 2 metacentric chromosome pairs (7, 10), 9 submetacentric (1–6, 8, 9 and 11) and 2 chromosomes pairs with SC (3 and 5); b C. canephora with 2 metacentric chromosome pairs (4 and 9), 9 submetacentric (1–3, 5–8, 10 and 11) and 1 with SC (chromosome 6); c C. arabica exhibits 5 metacentric chromosome pairs (7, 8, 13, 14 and 20), 16 submetacentric (1–6, 9–12, 15–19 and 21) and 1 acrocentric pair (22); and d HT 'CIFC 4106' possess 6 metacentric (10, 11, 19, 25, 26, 30) and 27 submetacentric chromosomes (1–9, 12–18, 20–24, 27–29, 31–33). Bar = 5 μm

Our new data about 5S rDNA mapping performed in C. eugenioides and C. canephora showed that the two diploids exhibit one 5S rDNA loci, in the interstitial region of chromosome 4 long arm for C. eugenioides, and in the pericentromeric portion of the long arm of chromosome 8 in C. canephora. The allotriploid HT ‘CIFC 4106’ shows two loci located interstitially in the long arm of chromosomes 9 and 13 (Fig. 4). The association of these data with chromosome class provide additional evidence for HT origin, as C. eugenioides chromosome 4 and HT 'CIFC 4106' chromosome 9 both exhibit one 5S rDNA loci in the same position. Therefore, the HT 'CIFC 4106' chromosome 9 was probably inherited from a reduced cell of C. arabica (CaEa), precisely from the Ea subgenome. In addition, the nuclear DNA content of HT 'CIFC 4106' (1C = 2.10 pg, Clarindo et al. 2013) is equivalent to the sum of the mean 2C nuclear genome size of C. canephora (2C = 1.41 pg, 1C = 0.705, Clarindo and Carvalho 2009) and the 1C nuclear value of C. eugenioides (2C = 1.38 pg, 1C = 0.690 pg, Sanglard et al. 2019) and HT 'CIFC 4106' chromosome number (2n  = 3x = 33) corresponds to the fusion of one reproductive cell of C. arabica (CaEa; n = 2x = 22) and one of C. canephora (C; n = x = 11). These data also support the CCaEa hypothesis. A summary of the main cytogenetic features and evolutionary relations among the two allopolyploid Coffea and its parental diploids is depicted in Fig. 5.

Fig. 4
figure 4

FISH mapping of 5S rDNA genes (red) on Coffea metaphase chromosomes using probes labeled with tetramethyl-rhodamine 5-dUTP. Chromosomes were counterstained with 4’,6-diamidino-2-phenylindole dihydrochloride (DAPI) (blue). a C. eugenioides with the 5S rDNA signal at the interstitial region of chromosome 4 long arm. b C. canephora exhibited a pericentromeric signal in the long arm of chromosome 8. c HT ‘CIFC 4106’ has two positive 5S rDNA signals, one in chromosome 9 and the other in chromosome 13, both interstitial. Bars = 5 μm

Fig. 5
figure 5

Summary of the main cytogenetic features with representative idiograms of two Coffea allopolyploids, C. arabica and HT ‘CIFC 4106’, and the diploids C. canephora and C. eugenioides, as well as their evolutionary relations. The data presented for these Coffea were obtained from Pinto-Maglio and Cruz (1998), Noirot et al. (2003), Clarindo and Carvalho (2006, 2009), Hamon et al. (2009), Bawin et al. (2020) and from the present work. For these Coffea were reported: 1C nuclear DNA content measured by flow cytometry; 2n chromosome number; ploidy level; class of each chromosome of the karyotype (M metacentric, SM submetacentric, A acrocentric); number and chromosomes with secondary constrictions (SC); number of 18S (green) and 5S (red) rDNA sites

Natural neoallopolyploids are valuable materials intensively used as evolutionary models in plant polyploidy research, such as Spartina anglica C. E. Hubb. (Ainouche et al. 2004), Senecio cambrensis Rosser, Senecio eboracensis Abbott & Lowe (Abbott and Lowe 2004), Tragopogon mirus Ownbey, Tragopogon miscellus Ownbey (Soltis et al. 2004), Cardamine × schulzii Urbanska-Worytkiewicz (Urbanska et al. 1997) and Mimulus peregrinus Vallejo-Marín (Vallejo-Marín 2012). As HT is a recent allotriploid hybrid originated only ~ 100 years ago, the refined study of its karyotype, in comparison to its progenitor C. arabica, would certainly provide substantial information on the genomic rearrangements associated with interspecific hybridization and polyploidy. Although not an established species due to the allotriploid condition and meiosis irregularities, HT may also be considered an interesting asset for evolutionary research regarding plant polyploidy in the context of natural populations.

Concluding remarks

Advances in Coffea cytogenetics over the years have been remarkable. Thus far, the development of improved protocols for classical and molecular cytogenetics, alongside with genomics, allowed a deeper understanding on the structure, organization and evolution of Coffea genomes. The relevance of these data also reaches applied research, with flow cytometry and karyotyping being used, for instance, to distinguish species, varieties and hybrids, including HT cytotypes. The potential role of cytogenetics and flow/image cytometries on aiding Coffea genomes assembly is also worthy of notice (Box 2). The aim of any genome sequencing project is to achieve an assembly to the chromosome level, with each scaffold assigned and oriented onto a chromosome. Therefore, the refinement of sequence contigs assignment to chromosomes often requires the integration between the complementary data obtained from sequencing technologies and molecular cytogenetic mapping. The basis of another relevant information, sequencing coverage, relies on reliably measuring the total genome size of the target species, for which flow cytometry has been considered the ‘gold standard technique’. In addition to nuclear genome size, both flow and image cytometries also allow measuring the size of each chromosome of the karyotype, aiding to estimate the coverage at the chromosome level.

The increase of the sequencing information will also allow a deeper analysis of Coffea evolution, and the integration with the molecular cytogenetics is fundamental to associate the ex situ with the in situ localization of DNA sequences. Nonetheless, mitotic metaphases and pachytene chromosomes from different species of Coffea, especially wild species of strict geographical distribution, will still be one of the main challenges for cytogenomics. Partial genome sequence analysis already revealed interesting patterns regarding the association of mobile elements with Coffea diversification and the recent publication of a new genome sequence from the wild C. humblotiana will open new fields for comparative analysis. We also believe that the allotriploid HT is an interesting target for cytogenomics, especially to study the early genomic dynamics after an allopolyploidization event, such as the TE burst, sequence losses or gains and global effects on epigenetic regulation. Therefore, we hope that the continuous improvement in techniques and a closer integration of different research areas will lay the ground for future ground-breaking discoveries about Coffea genome.

Author contribution statement

All authors equally contributed to this work and approved the final manuscript version for submission.