1 Introduction: Once Upon a Time – The Story of Gene Expression in Coffee Plants

Despite the economic importance of coffee in international market, the knowledge about coffee molecular biology, and particularly regarding gene cloning and expression, can be considered as relatively recent. The first coffee genes described in the literature correspond to the complementary DNA (cDNA) sequences of α-galactosidase (Zhu and Goldstein 1994) and metallothionein I-like protein (Moisyadi and Stiles 1995), the first being a Short Communication in Gene and the second, a Plant Gene Register in Plant Physiology. Both articles only reported the cloning of these cDNAs without analyzing the expression of corresponding genes in coffee tissues.

This was the situation when I just arrived in Nestlé-Tours Research and Development Centre to initiate a project aiming to identify genes involved in coffee cup quality. Based on all the researches describing the importance of storage proteins (particularly in cereals) in the quality of final products, our interest was logically focused first to characterize these proteins in coffee fruits. Then in 1999, we reported the first article describing the expression of csp1 (coffee storage protein) gene coding the 11S proteins accumulated during bean development (Rogers et al. 1999a). At that time, gene expression studies were always performed by Northern blot experiments requiring both high quantities of total RNAs and the preliminary cloning of studied genes in order to synthetize their corresponding radio-labelled DNA probes. This situation persisted until the beginning of the 2000s, and in 2004, there were only 1,570 nucleotide sequences and 115 proteins from coffee deposited in GenBank/EMBL databases.

Few years after, with the development of high-throughput sequencing techniques, the first coffee EST (“expressed sequence tag”) sequencing projects were realized, and in 2016 there were 35,153; 25,574; and 25,574 unigenes available in public databases for Coffea arabica, Coffea canephora, and Coffea eugenioides, respectively. Then, the development of real-time quantitative PCR (RT-qPCR) technology significantly accelerated the number of coffee gene expression studies. The access to these ESTs also permitted to set up a 15 K microarray (“PUCECAFE”) DNA chip which was used to perform the first large-scale expression analyses aiming to understand transcription networks in flowers, mature beans, and leaves of C. canephora, C. eugenioides, and C. arabica (Privat et al. 2011). The same chip was also used to analyze the leaf expression of homeologous genes in response to changing temperature between C. arabica and its two ancestral parents, C. canephora and C. eugenioides (Bardil et al. 2011).

Soon after came the next-generation Illumina RNA sequencing (RNAseq) method enabling to perform expression analyses of thousands of genes by in silico approaches. The first article using such techniques was published by Combes et al. (2013) who studied the transcriptome in leaves of C. arabica submitted to warm and cold conditions suitable to C. canephora and C. eugenioides, respectively. Since this work, numerous other RNAseq studies were published, and many others are actually ongoing. Using all these data, it is now possible to generate reference transcriptomes which should help us to identify candidate genes (CGs) correlated with agronomic and quality traits in coffee.

2 Coffee Gene Expression

2.1 Reference Genes for qPCR Experiments

Since the development of EST sequencing projects (for reviews, see Lashermes et al. 2008; de Kochko et al. 2010, 2017; and Tran et al. 2016), RT-qPCR experiments, using either SYBR Green fluorochrome or specific TaqMan probes, are nowadays used in routine to study coffee gene expression. In order to quantify the expression levels, these experiments require the use of endogenous reference genes (as internal controls) which must be previously validated for particular tissues (Bustin 2002; Bustin et al. 2009). In that sense, several articles were published to identify the best reference genes to be used in different coffee tissues and growth conditions.

The first were published in 2009 showing that GAPDH (coding the glyceraldehyde 3-phosphate dehydrogenase) and UBQ10 (coding ubiquitin) were stable reference genes for normalization of qPCR experiments in different tissues of C. arabica, particularly in leaves and roots under drought stress (Barsalobres-Cavallari et al. 2009; Cruz et al. 2009). These two genes are also the most suitable for data normalization when analyzing multiple or single stresses in leaves of C. arabica and C. canephora (Goulao et al. 2012). In another study, Fernandes-Brum et al. (2017a) showed that the most stable reference genes were AP47 (coding the clathrin adaptor protein medium subunit), UBQ, (ubiquitin 60S), RPL39 (ribosomal protein L39), and EF1α (elongation factor 1-alpha) in all tissues of C. arabica, while GAPDH and UBQ, together with ADH2 (class III alcohol dehydrogenase) and ACT (β-actin), were the most stable for all tissues of C. canephora.

When analyzing the caffeine biosynthetic pathway, Sreedharan et al. (2018) showed that GAPDH and UBQ were the reference genes presenting the lowest variability in leaves and developing endosperm of C. canephora between control samples and treatments with salicylic acid (SA), methyl jasmonate (MeJA), light exposure, and PEG, which permitted the quantification of xanthosine methyltransferase (NMT) coding genes. In fact, UBQ was commonly used as a reference gene to normalize expression studies during bean development (Salmona et al. 2008; Joët et al. 2009, 2010, 2014; Cotta et al. 2014; Dussert et al. 2018) as well as in other coffee tissues, such as leaves and flower buds (Marraccini et al. 2011, 2012; Vieira et al. 2013; Mofatto et al. 2016). Even though several studies reported that RPL39 was not the most accurate reference (Cruz et al. 2009; de Carvalho et al. 2013), this gene was also used as a reference to compare expression profiles of several genes in developing beans and also in different organs such as leaves, stems, branches, roots, and flowers (Lepelley et al. 2007, 2012a, b; Pré et al. 2008; Privat et al. 2008; Simkin et al. 2006, 2008; Bottcher et al. 2011).

On the other hand, GAPDH and UBQ appeared to be the less stable reference genes for transcript normalization in C. arabica hypocotyls inoculated with Colletotrichum kahawae (causing the coffee berry disease (CBD)), for which the use of IDE (coding insulin degrading enzyme) and β-Tub9 (coding β-tubulin) (Figueiredo et al. 2013) as references is recommended. In another study, de Carvalho et al. (2013) showed that GAPDH together with MDH (coding malate dehydrogenase) and EF1α can be used as reference genes in leaves and roots of C. arabica subjected to N-starvation and heat stress, while UBQ10 was the most suitable reference for salt stress treatments. Using RefFinder, a web-based tool integrating geNorm, NormFinder, and BestKeeper programs (Xie et al. 2012), Martins et al. (2017) showed that MDH (malate dehydrogenase) presented the highest mRNA stability to study leaf gene expression in both C. arabica and C. canephora species subjected to single or multiple abiotic stresses such as elevated temperature and CO2 concentration ([CO2]). In another work, Freitas et al. (2017) showed that the 24S (ribosomal protein 24S) and PP2A (protein phosphatase 2A) genes were the most suitable references to study expression in embryogenic and non-embryogenic calli, embryogenic cell suspensions, and somatic embryos at different developmental stages in C. arabica.

2.2 Gene Expression in Coffee Species

At the time of writing this review (I apologize if I forgot mentioning some studies), the number of genes for which expression studies have been carried out individually was around 700. Most of these studies were performed by RT-qPCR using specific primer pairs designed against coffee ESTs generated by sequencing projects. In a chronological order, the first project was the Nestlé and Cornell initiative which generated around 63,000 ESTs from six cDNA libraries from fruits and leaves (at different developmental stages) of C. canephora clones of the Indonesian Coffee and Cocoa Research Institute (ICCRI) (Lin et al. 2005). Next was the IRD project which led more than 10,400 ESTs also from fruits and leaves of C. canephora (Poncet et al. 2006). Finally, the “Brazilian Coffee Genome” Project (BCGP), coordinated by the UNICAMP [University of Campinas] and the Embrapa [Empresa Brasileira de Pesquisa Agropecuária]), produced more than 200,000 ESTs (Vieira et al. 2006; Mondego et al. 2011) from C. arabica (≈187,000), from C. canephora (≈15,500), and also from C. racemosa (≈10,500). In order to identify the maximum of genes, this project used 43 cDNA libraries; most of them were built from transcripts extracted from fruits and leaves at different developmental stages but also from different plant organs (flowers, roots) and tissues (calli, cell suspensions, etc.) subjected to various biotic (e.g., roots infected with nematodes, stems infected with Xylella spp., leaves infected with miner Leucoptera coffeella and rust fungus Hemileia vastatrix) and abiotic (e.g., suspension cells treated with NaCl and chemicals such as acibenzolar-S-methyl and brassinosteroids) stresses.

As reported in Tables 1, 2, and 3, most of these expression studies were performed in C. arabica (n ≈ 550 genes) and C. canephora (n ≈ 100 genes), with a repartition reflecting quite well the importance of C. arabica (59%) and C. canephora (41%) species in the worldwide coffee production (ICO 2020). These expression studies were more limited in other coffee species such as in C. racemosa (n = 25), C. eugenioides (n = 18), and C. liberica (n = 5). For 70 genes, expression analyses were performed on both C. arabica and C. canephora species. However, a limited number of studies (described in Sect. 2.3) analyzed gene expression simultaneously in C. arabica, C. canephora, and C. eugenioides using specific primers and qPCR for each homeolog in each species. Several articles also reported in silico gene expression profiles which were not confirmed by RT-qPCR (Table 4).

Table 1 List of coffee genes studied at the transcriptional level
Table 2 List of coffee genes studied at the transcriptional level
Table 3 List of high-throughput expression studies
Table 4 Expression studies performed in silico (without checking gene expression by RT-qPCR)

2.3 Coffee Gene Expression in C. arabica: A Tricky Case

Before discussing gene expression in coffee, it is important to remember that C. arabica (2n = 4× = 44) is an allotetraploid coffee species derived from a natural hybridization event between the two diploid (2n = 2× = 22) species C. canephora and C. eugenioides (Lashermes et al. 1999) which occurred approximately 10,000–50,000 years ago (Cenci et al. 2012). Consequently, the transcriptome of C. arabica is a mixture of transcripts expressed from homeologous genes harbored by its two sub-genomes, respectively, namely, CaCc (also referred as Ca) for C. canephora sub-genome and CaCe (also referred as Ea) for C. eugenioides sub-genome.

In the first attempt to analyze gene expression contributions of each sub-genome in C. arabica, Vidal et al. (2010) used qPCR coupled with allele-specific combination TaqMAMA-based method (Li et al. 2004) and developed a pipeline to find SNP (single nucleotide polymorphism) haplotypes of CaCc and CaCe homeologs in the ESTs of the BCGP. Of the 2069 contigs studied, these authors observed a biased expression for 22% of them, with 10% overexpressing CaCc homeologs and 12% overexpressing CaCe homeologs, therefore showing that the two sub-genomes do not contribute equally to the transcriptome of C. arabica. By analyzing gene ontology (GO), these authors also proposed that the CaCe sub-genome expressed genes of proteins involved in basal biological processes (such as those related to photosynthesis, carbohydrate metabolic processes, aerobic respiration, and phosphorylation). On the other hand, the CaCc sub-genome contributed to adjust Arabica expression (e.g., to biotic and abiotic stresses) through the expression of genes of regulatory proteins such as those related to hormone stimuli (mainly auxin), GTP signal transduction, translation, and ribosome biogenesis proteasome activity.

The 15 K “PUCECAFE” microarray (Privat et al. 2011) was also used to perform genome-wide expression study in order to analyze the effects of warm and cold temperatures on leaf gene expression of C. arabica and those of its two ancestral parents (C. canephora and C. eugenioides) (Bardil et al. 2011). Even though this global gene expression analysis did not allow determining the relative contributions of homeologs to the C. arabica leaf transcriptome, it revealed the existence of transcription profile divergences between the allopolyploid and its parental species that were greatly affected by growth temperature. Two other “in silico” analyses that studied the effects of warm vs. cold temperature in C. arabica were performed. The first one used SNP ratio quantification to monitor the relative expression of 13 homeologous gene pairs in five organs (cotyledons, young leaves, leaves, stems, and roots) in addition of warm/cold temperatures (Combes et al. 2012). No case of gene silencing or organ-specific silencing was detected, but 10 out of 13 sampled genes showed biased expression: 4 genes toward CaCe, 4 genes toward CaCc, and 2 genes toward CaCe or CaCc depending on the organ considered. In the second study, the effects of warm/cold temperatures on C. arabica leaf transcriptome were analyzed by RNA sequencing (Combes et al. 2013). The relative homeologous gene expression, assessed in 9,959 and 10,628 pairs of homeologs in warm and cold growing conditions, respectively, revealed that 65% of these genes had an equivalent expression level, while the rest (35%) showed biased homeologous expression. Although the warm and cold conditions were suitable for C. canephora or C. eugenioides parental species, respectively, neither sub-genome appeared preferentially expressed to compose the final transcriptome of C. arabica.

Because CaCc and CaCe sub-genomes of C. arabica have low sequence divergence (with an average difference for genes of only 1.3%) (Cenci et al. 2012), we can conclude that all the studies analyzing gene expression in C. arabica by “wet lab” approaches (e.g., Northern blot experiments for the most ancient and even RT-qPCR using primer pairs probably designed in highly conserved cDNA regions) quantify the transcripts expressed by both CaCc and CaCe sub-genomes.

However, few studies succeed in discriminating specifically the expression of CaCc and CaCe homeologs in C. arabica. All of them (described below) used the presence of SNPs or the small insertions and deletions (INDELs), for example, present in the 3′ and 5′ untranslated regions (UTRs), to design CaCc and CaCe primer pairs which permitted to identify homeologous differential expression (HDE) by qPCR. The first one concerned the expression of the CaWRKY1a (CaCc) and CaWRKY1b (CaCe) genes in C. arabica (Petitot et al. 2008, 2013) coding transcription factors known to be associated with plant defense responses to biotic and abiotic stresses (reviewed in Ülker and Somssich 2004; Eulgem 2006). In this species, both homeologs were concomitantly expressed in leaves and roots under all treatments (salicylic acid and infection by leaf rust [H. vastatrix] and root-knot nematode (RKN) Meloidogyne exigua), suggesting that they undergo the same transcriptional control.

A different situation was observed in C. arabica for the RBCS1 gene with the predominant expression of the homeolog CaCe (over the CaCc homeolog) in the leaves of non-introgressed (“pure”) cultivars such as Typica, Bourbon, and Catuaí (Marraccini et al. 2011), suggesting that specific suppression of RBCS1 CaCc expression occurred during the evolutionary processes that generated the C. arabica species. This situation fits with the concept of genome dominance (or genome expression dominance) for which the total expression of homeologs of a given gene in an allopolyploid is statistically the same as only one of the parents (Grover et al. 2012). However, RBCS1 CaCe and CaCc homeologs were co-expressed (with the same order of magnitude) in the leaves of C. arabica Timor hybrid HT832/2 used to create the IAPAR59; Tupi and Obabã cultivars of C. arabica, for example; as well as in Icatú which comes from a cross between C. canephora and C. arabica Bourbon. For all these “introgressed” Arabica cultivars, CaCc expression was always higher than CaCe. The existence of a bias in favor of CaCc homeologs suggests that one (or several) genetic factor of C. canephora species was introgressed in C. arabica together with the HdT (hybrid of Timor, a spontaneous hybrid between C. arabica and C. canephora) genes conferring resistance to leaf rust and activated (or unrepressed) the CaCc sub-genome.

In a work analyzing the effects of abiotic stress on the expression of genes of the mannitol biosynthesis pathway, de Carvalho et al. (2014) reported that the CaCc homeologs of CaM6PR (coding mannose-6-phosphate reductase), CaPMI (coding phosphomannose isomerase), and CaMTD (coding the NAD+-dependent mannitol dehydrogenase, oxidizing mannitol to produce mannose) were also highly expressed in leaves of C. arabica IAPAR59 subjected to drought, high salinity, and heat-shock stress.

HDE was also observed when analyzing expression of nsLTP (encoding non-specific lipid transfer proteins) genes in the separated tissue of developing beans (Cotta et al. 2014). More precisely, transcripts of CaLTP3 (CaCc) homeolog were detected at different stages of pericarp development, while CaLTP1/2 (CaCe) homeologs were weakly expressed in this tissue. However, both CaLTP homeologs were highly expressed during the first stages of endosperm development. In another study, we also reported the high expression of CaCc and CaCe homeologs of CaLTP genes in the plagiotropic buds of the drought-tolerant cultivar “IAPAR59” subjected to water limitation but not in those of the drought-susceptible cultivar “Rubi” (Mofatto et al. 2016). This could be related to the thicker cuticle observed on the abaxial leaf surface in IAPAR59 compared to Rubi.

In a more recent study, Vieira et al. (2019) analyzed the expression of five FRIGIDA-like (FRL) genes in flowers, beans, and somatic embryos of C. arabica. As previously reported (Combes et al. 2013), gene silencing was not detected for CaFRL genes, both CaCc and CaCe homeologs being expressed in all tissues analyzed. However, HDE was observed, for example, during early stages of flower development with a bias toward the expression of CaCc homeolog of CaFRL2, while a bias toward a CaCc homeolog CaFRL4 was noticed in the latter stages of endosperm development. However, for this latter gene, a bias toward the overexpression of CaCe homeolog was observed in somatic embryos. This homeostasis of gene expression observed in the allopolyploid C. arabica could explain why this species had a greater phenotypic plasticity compared to its C. canephora parent (Bardil et al. 2011; Bertrand et al. 2015).

3 Gene Expression in Coffee Tissues

3.1 Beans

Several thousands of bean cDNAs were generated in the frame of the first coffee EST sequencing projects. For example, the Nestlé and Cornell project used three fruit libraries of C. canephora realized at early (whole cherries, 18–22 WAP), middle (endosperm and perisperm, 30 WAP [weeks after pollination]), and late (endosperm and perisperm, 42–46 WAP) stages of fruit development, leading to 9,843; 10,077; and 9,096 ESTs, respectively (Lin et al. 2005). On the other hand, the IRD and CENICAFE sequencing projects also generated, respectively, more than 5,800 ESTs from C. canephora and 9,500 ESTs from C. arabica but without mentioning the fruit developmental stage (Poncet et al. 2006; Montoya et al. 2007), while the BCGP project produced 14,779 ESTs from 2 fruit libraries (FR1 and FR2) of C. arabica and 15,162 from 2 libraries (FR4 and FV2) of C. racemosa (Vieira et al. 2006; Mondego et al. 2011).

Regarding the 700 genes reported in Tables 1, 2, and 3, most expression studies were performed in developing coffee beans in which it is not a surprise if we consider that the analysis of its transcriptome is absolutely required to understand the basis of genetic and environmental variations in coffee quality. The time between anthesis and full ripening varies between C. arabica (from 6 to 8 months) and C. canephora (from 9 to 11 months), and it is usually referred to as days (or weeks) after anthesis (DAA), flowering (DAF), or pollination (DAP) (De Castro and Marraccini 2006). The different stages of developing coffee cherries are mainly defined on its size and also in accordance to the changes of exocarp (pulp) color occurring during the latest maturation steps (Pezzopane et al. 2003; Morais et al. 2008; Gaspari-Pezzopane et al. 2012; Vieira et al. 2019).

Considering the bean and its own tissues, it is now very well known that some important changes occur during its development. Soon after fecundation and up to mid-development (e.g., 90–120 DAF for C. arabica), the bean is mainly constituted of perisperm (maternal) which is thereafter progressively replaced by the endosperm which hardens as it ripens during the maturation phase (Fig. 1). For a practical point of view, most of the gene expression studies performed during bean development (referred to as BD in Tables 1, 2, and 3) analyzed the bean as a whole without extracting RNA from separated perisperm and endosperm issues. If it is true to consider that perisperm represents the main tissue in the earliest stages of development (up to 90 DAF), this is no more the case after, when it is reduced to the fine silver skin membrane surrounding the bean. Several works analyzed expression in separated perisperm and endosperm tissues like those studying expansins and HMGRs (human 3-hydroxy-3-methyglutaryl-CoA reductase) (Budzinski et al. 2010) or enzymes of the mevalonic acid (MVA) pathway involved in the biosynthesis of cafestol and kahweol diterpenes (Tiski et al. 2011).

Fig. 1
figure 1

Schematic representation of the seven developmental stages and tissue changes occurring during fruit development of C. arabica. The time is indicated in days after flowering (DAF). Tissues: Pe perisperm, En endosperm, Pc pericarp. RT-qPCR gene expression profiles of CaCSP1, CaOLE-1, and CaManS1 (coding for 11S globulin, oleosin, and mannan synthase, respectively) are chosen to illustrate accumulation of storage proteins, triacylglycerols, and cell wall polysaccharides. Adapted from Dussert et al. (2018)

In 2008, Salmona et al. performed a transcriptomic approach combining targeted cDNA arrays, containing 266 selected candidate gene sequences and RT-qPCR on a large subset of 111 genes to decipher the transcriptional networks during the C. arabica bean development. This study was the first dividing coffee bean development in seven stages (ST1 0–60 DAF, small fruit with aqueous perisperm; ST2 60–90 DAF, perisperm surrounding a very small liquid endosperm; ST3 90–120 DAF, aqueous endosperm growing and replacing the perisperm; ST4 120–150 DAF, soft milky endosperm; ST5 150–210 DAF, hard white endosperm with green pericarp; ST6 210–240 DAF, ripening cherries with pericarp turning to yellow; ST7 > 240 DAF, mature cherries with red pericarp) (Fig. 1). Few years later, the same research group completed this study by combining gene expression and metabolite profiles (analyzed by high-performance liquid chromatography) in order to identify the key metabolic pathways of coffee bean development (Joët et al. 2009, 2010, 2012).

Regarding sucrose metabolism, Geromel et al. (2006, 2008b) reported high expression of CaSUS1, coding the sucrose synthase isoform 1, at the earlier stages of endosperm development (ST4), and high expression of CaSUS2 (sucrose synthase isoform 2) at the later stages of endosperm development (ST6–7) but also in the perisperm at 205 DAF (Joët et al. 2009). Even restricted at a fine membrane surrounding the endosperm, the high SUS2 expression detected at that time in the perisperm could contribute to the peak of sucrose detected at the latest development stages in both pericarp and endosperm tissues (Rogers et al. 1999b).

Together with other studies, the genes involved in the most important biochemical pathways were now studied like those involved in sucrose (Geromel et al. 2006, 2008b; Privat et al. 2008; Joët et al. 2014), raffinose (dos Santos et al. 2011, 2015; Ivamoto et al. 2017a) metabolism, polysaccharide synthesis such as galactomannans (Marraccini et al. 2005; Pré et al. 2008; Joët et al. 2014; Dussert et al. 2018), lipid synthesis and transport (Simkin et al. 2006; Cotta et al. 2014; Dussert et al. 2018), caffeine (Ogawa et al. 2001; Uefuji et al. 2003; Mizuno et al. 2003a, b; Koshiro et al. 2006; Perrois et al. 2015; Maluf et al. 2009; Kumar and Giridhar 2015; Kumar et al. 2017), chlorogenic acids (CGAs) (Lepelley et al. 2007, 2012b), carotenoids (Simkin et al. 2010), trigonellines (Mizuno et al. 2014), storage proteins (Marraccini et al. 1999; Simkin et al. 2006; Dussert et al. 2018), and dehydrins and LEAs (Hinniger et al. 2006) (Table 1). Altogether, these studies revealed the existence of several phases during coffee bean development. The first one (perisperm-specific) is characterized by the synthesis of CGA occurring early in the perisperm and accumulation of chitinases, as also confirmed by 2D gel electrophoresis and protein sequencing (De Castro and Marraccini 2006; Alves et al. 2016). More recently, Ivamoto et al. (2017a) performed the first large-scale transcriptome analysis of C. arabica beans during initial (from 30 to 150 DAF) developmental stages, showing the predominant expression of genes of catalytic protein, kinases, cytochrome P450, and binding site domains in the perisperm, for example. The second phase (between ST3 and ST6) is characterized by the activation of cell wall polysaccharide (mainly galactomannans and arabinogalactans) biosynthetic machinery and the synthesis of storage proteins (Marraccini et al. 1999; Pré et al. 2008; Joët et al. 2014; Dussert et al. 2018) (Fig. 1). The third phase concerns the metabolic rerouting of CGA characterized by the HCT1 expression peak during the latest stages of seed development and the synthesis, storage, and exports of fatty acids requiring oleosins and LTPs (lipid transfer proteins). Finally, the last (endosperm-specific) stage is characterized by the sucrose synthesis and accumulation and dehydration of beans. These steps were recently confirmed by the recent long-read sequencing full-length (LRS) coffee bean transcriptome (Cheng et al. 2018). In that case, the last steps of coffee bean development were characterized by the drastic drop of chitinase transcripts and the great upregulation of genes coding late embryogenesis abundant (LEA) proteins, heat-shock proteins (HSPs), and ROS (reactive oxygen species) scavenging (e.g., superoxide dismutases, catalases, glutathione reductases, glutaredoxins, and glutathione peroxidases) and antioxidant (e.g., dehydroascorbate reductases, glutathione reductases, monodehydroascorbate reductases, and thioredoxins) enzymes, for example (Dussert et al. 2018).

The regulation of gene expression during coffee bean development should implicate specific transcription factors (TFs). In a recent study, Dong et al. (2019a) identified 63 NAC-like genes in the reference genome of C. canephora, coding TFs well-known to play important functions in plant development and stress regulations (Puranik et al. 2012). After FPKM (Fragments Per Kilobase of transcript per Million mapped reads) treatment of RNAseq data generated at different stages of fruit development, these authors identified 54 CcNAC genes with DEG (differentially expressed gene) profiles during the bean development which were verified by qPCR for 10 of them. This led to classify the CcNAC genes with continuous upregulated expression as positive regulator of bean development, while those showing downregulated expression were considered as negatively correlated with bean development.

In addition to the gene expression studies performed during coffee bean development, several works also analyzed gene expression in beans during drying (Bytof et al. 2007; Kramer et al. 2010; Santos et al. 2013; Selmar et al. 2006) and germination (da Silva et al. 2019; Lepelley et al. 2012a; Marraccini et al. 2001; Santos et al. 2013) processes.

3.2 Leaves

In the frame of the Nestlé/Cornell (Lin et al. 2005) and IRD (Poncet et al. 2006) sequencing projects, 8,942 and 4,606 ESTs were generated from C. canephora leaves, respectively, while 12,024 ESTs were also sequenced from C. arabica leaves by CENICAFE (Montoya et al. 2007). On the other hand, the BCGP produced 26,931 ESTs from 4 leaf libraries (LV4, LV5, young leaves from orthotropic branches, and LV8, LV9, mature leaves from plagiotropic branches) of C. arabica, as well as 5,567 ESTs of C. arabica leaves infected with leaf miner and leaf rust (RM1 library), and 13,111 ESTs from 2 leaf libraries (SH1 and SH3) of C. canephora plants grown under water deficit (Vieira et al. 2006; Mondego et al. 2011; Vinecky et al. 2012). In this project, leaf ESTs were also generated in the SS1 (960 ESTs), SH2 (7,368 ESTs), and AR1-LP1 (5,664 ESTs) cDNA libraries from tissue pools of C. arabica plantlets well-watered, drought-stressed, and treated with arachidonic acid, respectively. Since these studies, numerous projects aiming to study the effects of biotic and abiotic stresses in leaves by RNAseq were performed (see Sects. 4 and 5).

In coffee, leaves are important organs not only as source organs performing photosynthesis and sugar biosynthesis (Campa et al. 2004) but also because they synthesize many other biochemical compounds such as caffeine (Frischknecht et al. 1986; Ashihara et al. 1996; Zheng and Ashihara 2004; Ashihara 2006), chlorogenic acids (CGAs) (Ky et al. 2001; Bertrand et al. 2003; Campa et al. 2017), and trigonelline (Zheng et al. 2004; Zheng and Ashihara 2004) which are further exported to beans and involved in the final cup quality (Leroy et al. 2006).

From the data of Tables 1, 2, and 3, leaf expression studies were reported for more than 400 genes. The first published concerned the three methyltransferases of the caffeine pathway encoded by the XMT (xanthosine N-methyltransferase), MXMT (7-methylxanthine-N-methyltransferase or theobromine synthase), and DXMT (3,7-dimethylxanthine-N-methyltranferase or caffeine synthase) genes (Ogawa et al. 2001; Uefuji et al. 2003; Mizuno et al. 2003a, b). These studies, initially performed by semiquantitative PCR, were further completed by RT-qPCR to better specify the expression of CaXMT1, CaMXMT1, and CaDXMT2 genes (belonging to the C. canephora sub-genome) and CaXMT2, CaMXMT2, and CaDXMT1 (belonging to the C. eugenioides sub-genome) in young and mature leaves of C. arabica and C. canephora (Perrois et al. 2015).

Numerous other studies also detailed the leaf expression profiles of genes of photosynthesis (Marraccini et al. 2003, 2011), sugar metabolism (Privat et al. 2008), and the biosynthetic pathways of carotenoids (Simkin et al. 2008), trigonelline (Mizuno et al. 2014), CGAs (Lepelley et al. 2007, 2012b), and diterpenes (Ivamoto et al. 2017b), for example.

3.3 Roots

More than 12,000 root ESTs were produced in the frame of the BCGP from 4 libraries (RT3, roots; NS1, root infected by nematodes; RT5, roots treated with acibenzolar-S-methyl – a systemic acquired resistance [SAR] inducer; and RT8, roots stressed with aluminum) of C. arabica (Vieira et al. 2006; Mondego et al. 2011). In 2006, 1,587 ESTs were produced from embryonic roots of two C. arabica cultivars (De Nardi et al. 2006). Among them, 1,506 sequences were used to set up a cDNA microarray which led to the identification of 139 genes differentially expressed in response to induced SAR. In the frame of PhD thesis of T.S. Costa (2014), 25,574 cDNA sequences were generated from roots of drought-susceptible and drought-tolerant clones of C. canephora Conilon submitted to water limitation. Even though these data were not deposited in public databases, this study permitted to identify several genes with upregulated expression under drought (see Sect. 5.1). In a more recent RNAseq study, dos Santos et al. (2019) obtained 34,654 assembled contigs from N-starved roots of C. arabica and identified three AMT (coding specific transporters of ammonium) and three NRT (coding nitrate transporters) for which in silico gene expression profiles (dos Santos et al. 2017) were validated by RT-qPCR (dos Santos et al. 2019). Expression profiles in roots were also reported for genes of sugar (Geromel et al. 2006) and caffeine (Ogawa et al. 2001) biosynthetic pathways.

3.4 Flowers

Compared to fruits, leaves, and roots, the studies analyzing gene expression in flowers are very limited. In terms of genetic resources, the BCGP generated 23,036 ESTs from 3 cDNA libraries (FB1, FB2, and FB4) of flowers in different developmental stages and 14,779 ESTs from 2 libraries (FR1 and FR2) corresponding to a mixture of transcripts extracted from flower buds and fruits at different developmental stages (Vieira et al. 2006; Mondego et al. 2011). The CENICAFE research group also reported the production of 8,707 EST sequences from flowers of C. arabica (cv. Caturra), but these data were neither released in public databases. In a recent RNAseq study, Ivamoto et al. (2017a) identified several genes that were exclusively expressed in flowers such as those coding a FASCICLIN-like arabinogalactan protein precursor (FLA3, a protein with InterPro FAS1 Domain IPR000782) and a pectin esterase inhibitor (InterPro Domain IPR006501).

The studies of Asquini et al. (2011) and Nowak et al. (2011), aiming to characterize S-RNase genes and to analyze their expression in pistils (at pre- and post-anthesis stages) and stamens of C. arabica and C. canephora flowers, were also worth noting.

Other studies characterized the genes of C. arabica coding MADS-box TFs (involved in the floral organ identity) and also checked the expression of FLOWERING LOCUS C (FLC), AGAMOUS, APETALA3, and SEPALLATA3 (de Oliveira et al. 2010, 2014). In a more recent study, Vieira et al. (2019) analyzed the expression of five FRIGIDA-like (FRL) genes, coding key proteins that regulate flowering by activating FLC (Wang et al. 2006). In that case, these authors used the qPCR TaqMAMA-based method (Li et al. 2004) to identify the expression of CaCc and CaCe homeologs of FRL genes in C. arabica flowers at different development stages (see also Sects. 2.3 and 3.5). Altogether, these results should help us to understand the genetic determinisms controlling the gametophytic self-incompatibility system of C. canephora (Berthaud 1980; Lashermes et al. 1996; Moraes et al. 2018) and coffee male sterility (Mazzafera et al. 1990; Toniutti et al. 2019a).

3.5 Somatic Embryogenesis

In coffee, the somatic embryogenesis (SE) is important particularly to propagate elite clones of C. canephora and F1 hybrids of C. arabica that could not be spread by seeds (Etienne et al. 2018; Bertrand et al. 2019; Georget et al. 2019). This is the reason why several laboratories are working to identify the genes controlling the main phases and key developmental switches of coffee SE. This also explains the important number (12) of cDNA libraries from suspension cells, calli (primary, embryogenic, and non-embryogenic), and embryos performed in the frame of the BCGP, which generated more than 65,000 ESTs (Vieira et al. 2006; Mondego et al. 2011).

Among these genes, it was reported that the expression of CcLEC1 (LEAFY COTYLEDON 1, a key regulator for embryogenesis) and CcBBM1 (BABY BOOM 1, a AP2/ERF TF associated with cell proliferation) was only observed after SE induction in C. canephora, whereas CcWOX4 (WUSCHEL-RELATED HOMEOBOX4, a plant regulator of embryogenic patterning and stem cell maintenance) expression decreased during embryo maturation (Nic-Can et al. 2013). The expression of BBM and SERK1 (somatic embryogenesis receptor-like kinase 1, a positive regulator of SE activating the YUCCA [flavin-containing monooxygenase]-dependent auxin biosynthesis) genes could also constitute a good parameter for evaluating the development and quality of C. arabica (Silva et al. 2014, 2015; Torres et al. 2015) and C. canephora (Pérez-Pascual et al. 2018) embryogenic cell suspensions. The fact that expression of FLC and FRL (especially that of CaFRL-3, CaFRL-4, and CaFRL-5) genes, initially reported as regulators of flowering development, was also observed in both zygotic and somatic embryos of C. arabica (Vieira et al. 2019) clearly indicates that both embryogenesis processes share common developmental pathways.

In order to better understand the transcriptomic changes occurring during SE process, Quintana-Escobar et al. (2019) recently performed the first RNAseq study analyzing different stages of SE induction in C. canephora. Among the genes differentially expressed, these authors identified eight ARF (auxin response factors) as well as seven Aux/IAA (auxin/indole-3-acetic acid regulators) and confirmed that CcARF18 and CcARF5 genes were highly expressed after 21 days of the SE induction. In another recent study, Pinto et al. (2019) characterized 17 GH3 genes from C. canephora (encoding the Gretchen Hagen 3 already reported to be key proteins controlling somatic embryogenesis induction through auxin) and analyzed their expression profiles in cells with contrasting embryogenic potential in C. arabica, showing that CaGH3.15 was correlated with CaBBM, a C. arabica ortholog of a major somatic embryogenesis regulator (Silva et al. 2015). Altogether, these genes could be useful as markers to follow the SE stage converting somatic to embryogenic cells.

4 Coffee Gene Expression in Response to Biotic Stress

Recent modeling studies have delivered warnings on the threat of climate change (CC) by increasing attacks by pests and pathogens (Avelino et al. 2004, 2015; Ghini et al. 2008, 2011, 2015; Jaramillo et al. 2011; Kutywayo et al. 2013; Magrach and Ghazoul 2015). For both C. canephora and C. arabica, the main pests and diseases are (1) the leaf rust caused by the fungus H. vastatrix, (2) the leaf miner Leucoptera coffeella (Guérin-Mèneville), (3) the root attacks caused by nematodes, (4) the fruit damages caused by the borer Hypothenemus hampei, and (5) the coffee berry disease (CBD) caused by the hemibiotrophic fungus Colletotrichum kahawae which is a major constraint of C. arabica coffee production in Africa (van der Vossen and Walyaro 2009).

Regarding the coffee genetic diversity, most of C. canephora are resistant to coffee leaf rust (CLR), while “pure” (non-introgressed) C. arabica are susceptible. However, Catimor and Sarchimor cultivars of C. arabica introgressed with the HdT are considered as totally or partially resistant to CLR (Eskes and Leroy 2004). Natural resistances to coffee berry borer (CBB) and coffee leaf miner (CLM) are rather limited in both C. canephora and C. arabica species. However, natural resistance to the CLM can be found in several wild coffee diploid species, such as in C. racemosa (Guerreiro-Filho et al. 1999; Guerreiro-Filho 2006), and has been introgressed into C. arabica to generate new cultivars (e.g., Siriema) resistant to CLR (Matiello et al. 2015). Regarding nematodes, a large genetic diversity exists particularly in diploid species (e.g., C. canephora, C. liberica, and C. congensis) but less in C. arabica, regarding the variation in resistance particularly to the root-knot Meloidogyne spp. from high susceptibility to near immunity as it is the case of the clone 14 of C. canephora Conilon (Lima et al. 2014, 2015). Information about genetic resistance to coffee berry borer (CBB) is very limited for both C. arabica and C. canephora species. However, Romero and Cortina (2004, 2007) reported a reduction of CBB growth rate when H. hampei is fed with C. liberica fruits. In another study, Sera et al. (2010) showed that C. kapakata, Psilanthus bengalensis, C. eugenioides, as well as genotypes introgressed with C. eugenioides were CBB resistant. In that case, the CBBR of C. eugenioides and C. kapakata was observed at the pericarp level (but not in the bean), while P. bengalensis presented CBBR in both tissues. In addition to be CLRR, some C. arabica coming from HdT, but also the F1 hybrid cultivar Ruiru 11, were also reported as CBDR (Omondi et al. 2004, Walyaro 1983; Van der Vossen 1985). This genetic diversity observed in the Coffea genus regarding these different abiotic stresses could be used to identify the genes controlling these resistances and to initiate new breeding programs aiming to create new hybrids better resistant to pests and diseases.

On the other hand, the BCGP produced more than 5,000 ESTs of C. arabica from RM1 (leaves infected with CLM and CLR) and NS1 (roots infected with nematodes) (Vieira et al. 2006; Mondego et al. 2011. In a recent study, genes coding for the LOX (lipoxygenase), AOS (allene oxide synthase), AOC (allene oxide cyclase), and OPR (12-oxo-phytodienoic acid reductase) enzymes involved in the production of jasmonic acid (JA), one of the key plant hormones involved in plant defense against insect pests, were identified in C. canephora by bioinformatic approaches (Bharathi and Sreenath 2017) but without confirming gene expression of this pathway in infested coffee plants.

4.1 Coffee Leaf Rust (CLR)

In 2004, Fernandez et al. used suppression subtractive hybridization (SSH) method and semiquantitative RT-PCR to identify C. arabica L. genes involved in the specific hypersensitive reaction (HR) upon infection by H. vastatrix. Among the genes showing HR upregulation were those coding for receptor kinases, AP2 domain and WRKY TFs, cytochromes P450, heat-shock 70 proteins, several glucosyltransferases, and NDR1, for example. Other studies showed that SA and MeJA treatments markedly upregulated the expression of CaNDR1 (coding a non-race-specific disease resistance protein well-known to be involved in resistance signalization pathway in Arabidopsis thaliana) and CaWRKY1 genes, suggesting a key role of their corresponding proteins in the molecular resistance responses of coffee to H. vastatrix (Ganesh et al. 2006; Cacas et al. 2011; Petitot et al. 2008, 2013). This was confirmed by Ramiro et al. (2010) who showed that in addition to CaWRKY1, expression of CaWRKY3, CaWRKY17, CaWRKY19/20/21, and CaWRKY22 genes was also highly upregulated upon CLR. Although a significant correlation was also observed between WRKY expression profiles after MeJA and rust treatments, expression of coffee genes involved in JA biosynthesis, including allene oxide synthase (CaAOS) and lipoxygenase (Ca9-LOX and Ca13-LOX), did not support the involvement of JA in the early coffee resistance responses to CLR.

The first valuable EST dataset from C. arabica CIFC 147/1 (CLR resistant) infected by leaf rust was produced by Fernandez et al. (2012) who identified 205,089 ESTs and 13,951 contigs from coffee together with 57,332 ESTs and 6,763 contigs from H. vastatrix. Among the most abundant coffee genes expressed in rust-infected leaves were those coding for several pathogenesis-related (thaumatin-like) proteins and enzymes of carbohydrate, amino acid, and lipid transport/metabolism. Florez et al. (2017) also used the C. arabica cultivars Caturra (CLR susceptible) and HdT CIFC 832/1 (CLR resistant) to generate 43,159 contigs which were assembled using as a reference the genome of C. canephora (Denoeud et al. 2014). Among DEG profiles identified by RT-qPCR were genes coding for a putative disease resistance protein RGA1, putative disease resistance response (dirigent-like protein) family protein, and Premnaspiridione oxygenase with higher expression at early stage of rust infection in the resistant cultivar plant than in the susceptible genotype. In addition, expression of several TFs (putative basic helix-loop-helix bHLH DNA-binding superfamily protein and ethylene-responsive transcription factor 1B) was detected earlier in HdT than in Caturra, suggesting that they may be involved in the defense mechanisms of the CLRR cultivar. In a more recent study, Echeverría-Beirute et al. (2019) performed RNAseq approach to study the effects of CLR and fruit thinning in leaves of susceptible cultivars red Catuaí (Caturra x Mundo Novo) and F1 hybrid H3 (Caturra x Ethiopian 531) of C. arabica. Using regression and prediction statistical models, these authors identified 460 DEGs between the inbred and the F1 hybrid. Among them, the expression of PR (pathogenesis-related) genes was upregulated in Catuaí, while those coding proteins involved in homoeostasis increased in the F1 hybrid. Even though these results were not confirmed by RT-qPCR, they validate the hypothesis of lower impact of CLR in F1 hybrids (Echeverria-Beirute et al. 2018) due to their physiological status, which itself depends on their genetic background, plant vigor, agronomic conditions, and environmental factors (Toniutti et al. 2017, 2019b).

4.2 Coffee Leaf Miner (CLM)

Although the defense mechanisms to leaf miner are not well understood, previous genetic analyses suggested that this resistance was dominant and controlled by a limited number of genes (Guerreiro-Filho et al. 1999). The first attempt to identify these genes was performed by SSH method coupled with the screening of DNA macroarrays to study gene expression in the leaves of the CLM-susceptible (CLMS: red Catuaí) and CLM-resistant cultivar (CLMR corresponding to a backcross of [C. racemosa x C. arabica x C. arabica]) infested by L. coffeella (Mondego et al. 2005). From the 1,500 ESTs spotted on the array, upregulated expression upon CLM infestation was observed for several ESTs coding proteins previously reported to be related to plant defense and biotic stress and similar to the phospholipase D, the lipoxygenase LOX3, the late embryogenesis abundant protein 1 (LEA1), the acid phosphatase vegetative storage protein (VSP), and the lipid transfer protein/trypsin inhibitor/seed storage domain, for example. For CaPR8 (class III chitinase), CaSPC25 (signal peptidase complex subunit), CaPSAH (photosystem I), CaCAX9 (a putative calcium exchanger), and CaBEL (BEL1-related homeotic protein 29) genes, their upregulated expression upon CLM infestation suggested that they play a key role in coffee defense mechanisms against L. coffeella.

In a more recent study, Cardoso et al. (2014) used a 135 K microarray (NimbleGen) based on the 33,000 genes identified in the frame of the BCGP, to identify DEG genes in CLMS and CLMR cultivars of C. arabica at three stages (T0, non-infected/control; T1, egg hatching, and T2, egg eclosion) of interaction with L. coffeella. Even though previous studies reported that caffeine has no effect on leaf miner survival rates (Guerreiro-Filho and Mazzafera 2000; Magalhães et al. 2010), high upregulated expression of a putative caffeine synthase gene was reported at both T0 and T2 in CLMR leaves compared to CLMS ones. In the same study, expression profiles of genes involved in plant response pathways to herbivory attacks (e.g., linoleic acid cycle, phenylpropanoid synthesis, and apoptosis), as well as JA (e.g., coding lipoxygenase and enoyl-CoA hydratase) and flavonoids (e.g., coding chalcone synthase and flavanone 3-hydroxylase-like) biosynthesis, were also upregulated in CLMR plants even in the absence (at T0) of leaf miner infestation, indicating that defense was already built up in these plants prior to infection, as a priming mechanism.

4.3 Nematodes (NEM)

Despite the important damages caused by nematodes, there are a limited number of studies analyzing the coffee gene responses to these pathogens. When studying WRKY genes coding transcription factors regulating plant responses to biotic stresses, Ramiro et al. (2010) reported that expression of CaWRKY6, CaWRKY11, CaWRKY12, CaWRKY13/14, CaWRKY15, and CaWRKY17 genes was upregulated in roots of C. arabica cv. IAPAR59 infected by the RKN Meloidogyne exigua. In another work, Severino et al. (2012) reported upregulated expression of CaPRX (encoding a putative class III peroxidase) in roots inoculated with RKN M. paranaensis but with significant difference between susceptible (C. arabica cv. Catuaí) and resistant (C. canephora cv. Robusta) plants. The nematode-resistant (NEMR) clone 14 of C. canephora Conilon (Lima et al. 2014, 2015) was also used to investigate gene expression in roots at regular days after infestation (4, 8, 12, 20, 32, and 45 DAI) by the root-knot M. paranaensis (Lima 2015). The RNAseq data (not yet publicly available) showed higher expression levels of several PR (pathogenesis-related) genes, such as those coding class III chitinase and NBS-LRR proteins, in infected roots of NEMR clone 14 than in those of NEMS clone 22. In addition, the peak of NBS-LRR transcripts was detected at 8 and 20 DAI for the clones 14 and 22, respectively, suggesting earlier expression of this gene in NEMR than in NEMS coffee clones (Valeriano et al. 2019). RT-qPCR experiments also showed that expression of CcCPI1 (coding a cysteine proteinase inhibitor) was higher in roots of clone 14 than in those of 22, with or without nematode infestation, suggesting that the this protein, also highly expressed in coffee beans under development and germination (Lepelley et al. 2012a), could also play a key role in controlling nematode development. In that sense, CPIs have already been reported to inhibit proteinases in the digestive tracts, therefore reducing the destructive effects of herbivorous insects (Benchabane et al. 2010; Schluter et al. 2010), and to increase tolerance to nematodes as well as to fungal and bacterial pathogens in transgenic plants (Urwin et al. 2003; Martinez et al. 2005).

4.4 Coffee Berry Borer (CBB)

Considering that C. arabica fruits are more susceptible to CBB than those of C. liberica, Idárraga et al. (2012) constructed cDNA libraries from fruits for these two species infested with H. hampei and generated 3,634 singletons and 1,454 contigs. In silico analyses revealed that infested C. arabica berries displayed a higher number of DEG genes coding proteins involved in general stress responses, while genes coding proteins involved in insect defense were overexpressed in C. liberica. For some of these genes, expression profiles in infested cherries were checked by RT-qPCR. Interestingly, expression levels of genes coding a hevein-like protein, an isoprene synthase, a SA carboxyl methyltransferase, and a patatin-like protein appeared much more upregulated in C. liberica than in C. arabica. The upregulation of these genes was already reported in other plants in response to insect herbivory and JA treatments (Kiba et al. 2003; Reymond et al. 2000; Falco et al. 2001), suggesting that they could be involved in the partial resistance to CBB in C. liberica.

4.5 Coffee Berry Disease (CBD)

Cytological and biochemical studies revealed that coffee resistance to C. kahawae is characterized by restricted fungal growth associated with several host responses, such as hypersensitive-like cell death (HR), callose deposition, accumulation of phenolic compounds, lignification of host cell walls, and increased activity of oxidative and peroxidase enzymes (Silva et al. 2006; Gichuru 1997, 2007; Loureiro et al. 2012).

The first study analyzing gene expression in response to C. kahawae was performed by Figueiredo et al. (2013) in hypocotyls of C. arabica cultivars Catimor 88 (HdT derivative CBDR) and Caturra CIFC 19/1 (CBDS). These authors showed that expression levels of RLK (coding a receptor-like kinase) and PR10 (coding a pathogenesis-related protein 10) genes were higher in Catimor than in CBD-infected Caturra. Interestingly, upregulated expression of these two genes was also reported during coffee infection with H. vastatrix (Fernandez et al. 2004). In order to understand the molecular mechanisms involved in coffee resistance to C. kahawae, Diniz et al. (2017) evaluate the expression of genes involved in SA, JA, and ethylene (ET) pathways in the same cultivars. From the 14 genes studied by RT-qPCR, these authors showed the involvement of JA and ET phytohormones rather than SA in this pathosystem. Regarding the ET pathway, the strong activation of ERF1 gene (coding for ET receptor) at the beginning of the necrotrophic phase suggests the involvement of ethylene in tissue senescence.

4.6 Gene Expression in Response to Other Pests and Diseases

Of the two commercially cultivated coffee species, C. arabica and C. canephora are considered as susceptible and resistant, respectively, to the insect pest Xylotrechus quadripes known as coffee white stem borer (CWSB). Using SSH approach, Bharathi et al. (2017) identified 265 unigenes overexpressed in C. canephora bark tissues upon CWSB larval infestation, many of them coding putative pectin-degrading enzymes like a pectate lyase (Cc07_g00190Footnote 1), three polygalacturonases (Cc03_g15700, Cc03_g15740, and Cc03_g15840), and a pectinacetylesterase (Cc08_g04630). By RT-qPCR, these authors also showed that the expression of Cc07_g00190 was strongly induced at 72 h after CWSB infestation. The possible role of this pectinolytic enzyme in the production of oligogalacturonides was proposed, which could act as elicitors involved in defense responses of C. canephora to CWSB (Bharathi and Sreenath 2017).

5 Coffee Gene Expression in Response to Abiotic Stress

Several models predicted that CC will have strong negative impacts on both C. canephora and C. arabica species at environmental, economic, and social levels (Assad et al. 2004; Bunn et al. 2015a, b; Ovalle-Rivera et al. 2015; Davis et al. 2012, 2019; Moat et al. 2017, 2019). Drought and high air temperatures are undoubtedly the major threats to coffee production, forecasted by potential climate changes (IPCC 2013). Drought is a limiting factor that affects flowering and yield of coffee (DaMatta and Ramalho 2006), as well as bean development and biochemical composition and consequently the final cup quality (Silva et al. 2005; Vinecky et al. 2017). Increased [CO2] in air is also a key factor for coffee plant acclimation to high temperature; strengthening the photosynthetic pathway, metabolism, and antioxidant protection; and modifying gene transcription and mineral balance (Ramalho et al. 2013; Martins et al. 2014, 2016; Ghini et al. 2015; Rodrigues et al. 2016). In this context, understanding the genetic determinism of coffee’s adaptation to abiotic stress has become essential for creating new varieties (Cheserek and Gichimu 2012).

5.1 Drought

The first study analyzing the effects of drought stress was performed by Simkin et al. (2008), who reported the gene expression profiles of the carotenoid biosynthesis pathway in leaf, branch, and flower tissues of C. arabica subjected to water withdrawal. In this work, it was shown that the transcript levels of PTOX, CRTR-B, NCED3, CCD1, and FIB1 increased under drought, suggesting that drought favored the synthesis of xanthophylls implicated in the adaptation of plastids to changing environmental conditions by preventing photooxidative damage of the photosynthetic apparatus. On the other hand, drought was reported to decrease the RBCS1 gene expression in both C. arabica and C. canephora species (Marraccini et al. 2011, 2012). However, this reduction was not accompanied by a decrease of RBCS1 protein in the leaves of C. canephora under water withdrawal. In the same work, it was also shown that the transcriptional contribution of each RBCS1 homeolog may be affected by drought in C. arabica cultivars (Marraccini et al. 2011). In C. canephora, and whatever the clone studied, drought was also shown to downregulate the leaf expression of many genes related to photosynthesis such as CcCAB1 (coding chlorophyll a-/b-binding proteins), CcCA1 (coding for the carbonic anhydrase supplying CO2 for Rubisco), as well as expression of CcPSBO, CcPSBP, and CcPSBQ genes coding proteins of the PSII oxygen-evolving complex (Marraccini et al. 2012; Vieira et al. 2013).

On the other hand, drought stress significantly upregulated the expression of genes coding proteins involved in maintenance, reinforcement, and protection during the dehydration-rehydration process such as dehydrins and glycin-rich and heat-shock proteins in C. canephora (Marraccini et al. 2012; Vieira et al. 2013) and C. arabica (Santos and Mazzafera 2012; Mofatto et al. 2016). Drought stress was also shown to increase expression of some PIP (plasma membrane intrinsic proteins) genes in the leaves and roots of different coffee species, suggesting the involvement of these aquaporins in controlling the water status in coffee plants (dos Santos and Mazzafera 2013; Miniussi et al. 2015).

In coffee, like in many other plants, drought stress was also reported to affect the metabolic pathways involved in the synthesis of many solutes such as sugars of the raffinose family oligosaccharides (RFOs) (e.g., trehalose, raffinose, and stachyose), already described to be involved in osmoprotection against abiotic stresses in plants (Kerepesi and Galiba 2000). The upregulated expression of CaGolS2 and CaGolS3 genes coding galactinol synthases explained the increase of raffinose and stachyose contents also observed in leaves of C. arabica cv. IAPAR59 plants submitted to severe water deficit (dos Santos et al. 2011). In C. canephora Conilon, water limitation also increased CcGolS1 gene expression in leaves of the drought-tolerant (DT) clone 14 but decreased the expression of the same gene in leaves of the drought-susceptible (DS) clone 109A (dos Santos et al. 2015). Drought was also shown to upregulate the expression of M6PR gene coding the mannose-6-phosphate reductase in leaves of both C. canephora (Marraccini et al. 2012) and C. arabica (Freire et al. 2013). In C. arabica cv. IAPAR59, the increased expression of CaPMI (mannitol synthesis) and decreased CaMTD (controlling mannitol degradation) expression under drought were correlated with high mannitol levels detected in leaves under drought conditions (de Carvalho et al. 2014).

Drought also increased the expression of regulatory genes CcRD29, CcRD26, and CcDREB1D coding a RD29-like protein, a NAC-RD26-like TF, and an AP2/ERF DREB-like TF, respectively, in DT (14, 73, and 120) and DS (22) clones of C. canephora Conilon (Marraccini et al. 2012; Vieira et al. 2013). Even though these studies highlighted the existence of different mechanisms among the DT clones of C. canephora regarding water deficit, they also showed that CcDREB1D expression was always higher in leaves of DT clones (particularly in clone 14) than in those of DS clone 22 under water withdrawal (Fig. 2). Upregulated expression of the CcDREB1D was also reported in leaves of C. canephora and C. arabica subjected to low relative humidity (Thioune et al. 2017; Alves et al. 2018). A study of CcDREB1D promoter regions in the DT clone 14 and DS clone 22 revealed the existence of several haplotypes diverging by several SNPs and insertions/deletions (Alves et al. 2017). A functional analysis of these promoters in transgenic plants of C. arabica var. Caturra showed that haplotype HP16 (found in the DT clone 14) was able to drive the expression of the uidA reporter gene under water deficit in leaf mesophyll and guard cells more strongly and earlier than the HP15 (present in both clones) and HP17 (only present in DS clone 22) haplotypes (Alves et al. 2017). In a more recent work aiming to study the expression of DREB-like genes regarding various abiotic stresses (Torres et al. 2019), drought (mimicked by water limitation) was shown to upregulate expression of CcDREB1B, CcRAP2.4, CcERF027, CcDREB1D, and CcTINY mainly in leaves of C. canephora DT clones, while drought (mimicked by low humidity) upregulated the expression of CaERF053, CaRAP2.4, CaERF017, CaERF027, CaDREB1D, and CaDREB2A.1 in leaves of C. arabica. On the other hand, expression of CcDREB2F, CcERF016, and CcRAP2.4 genes was greatly upregulated under drought specifically in the roots of DS clone 22 (Fig. 2), which could help this clone to compensate its low efficiency in controlling stomatal closure and high reduction of net CO2 assimilation (A) observed upon drought acclimation (Marraccini et al. 2012).

Fig. 2
figure 2

Gene expression profiles of DREB-like genes in leaves and roots of DT (14, 73, and 120) and DS (22) clones of C. canephora Conilon subjected (NI not irrigated, black isobars) or not (I irrigated, white isobars) to water limitation. The DT and DS clones are separated by a vertically dotted line. Gene names are indicated in the histograms. Expression values corresponding to the mean of three biological and technical replications (±SD) are expressed in fold change relative to the expression level of the sample 22I as the reference sample (relative expression = 1). Transcript abundances were normalized using the expression of the CcUBQ10 (Barsalobres-Cavallari et al. 2009) as the endogenous control. Treatments sharing the same letter are not significantly different. Data adapted from Torres et al. (2019)

M.G. Cotta (2017) also analyzed the expression profiles of genes coding the PYR/PYL/RCAR-SnRK2-PP2C proteins known to be involved in the first steps of ABA perception and signal transduction in plants (Klingler et al. 2010), in leaves, and in roots of DT (14, 73, and 120) and DS (22) clones of C. canephora subjected to drought. In leaves, drought downregulated the expression of CcPYR1, CcPYL2, and CcPYL4 genes (coding ABA receptors) and upregulated the expression of CcAHG2 and CcHAB (coding PP2C phosphatases functioning as negative regulators of ABA pathway) in DT clones. However, expression of SnRK2 genes (coding protein kinases functioning as positive regulators of this pathway) was poorly affected by drought conditions. On the other hand, drought upregulated the expression of PP2C (e.g., CcABI1, CcABI2, CcAHG3) and SnRK2 (e.g., SnRK2.2, SnRK2.6, and SnRK2.7) genes mainly in roots of C. canephora DT clone 120. CcPYL8b was the gene most expressed in drought-stressed roots, particularly in DT clones 73 and 120, while expression of CaPYL8a was upregulated by drought mainly in leaves of C. arabica DT accession (Santos et al. 2019).

In C. canephora, Menezes-Silva et al. (2017) reported that coffee plants exposed to multiple drought events tended to display a higher expression of the RD29B and RD22 genes which could be involved in acclimation to repeated drought events. Recently, de Freitas Guedes et al. (2018) performed an RNAseq study to analyze the effects of multiple drought stress on gene expression in leaves of the DT clone 120 and DS clone 109 of C. canephora. Among the 22,764 genes generated, these authors identified 49 genes in the DT clone (e.g., coding a MYB-like proteins or for defense-related proteins containing LRR and kinase domains), which could be involved in stress “memory.”

As previously mentioned, Costa (2014) analyzed the expression profiles of several genes in roots of DS and DT clones of C. canephora Conilon submitted to water limitation. Among the identified DEGs, it is worth noting that upregulated expression was specifically observed under drought in roots of the DT clone 14 for the CcMJE1 (coding a protein involved in MeJA metabolism), CcNCED3 (encoding a rate-limiting protein involved in the synthesis of abscisic acid), CcPAP1 (coding a putative protein containing the acid phosphatase domain TIGR01675 characterizing vegetative storage proteins (VSPs)), CcPRX1 (coding for a putative peroxidase), and CcclXIP (coding a chitinase-like xylanase inhibitor protein), as well as CcM6PR, CcGOLS3b, and CcLTP4 (involved in RFOs and lipid biosynthesis pathways) genes. More recently, Vasconcelos et al. (2011) reported that the protein expressed from the CaclXIP cDNA (originally identified as a class III chitinase encoding gene from C. arabica) functioned as a chitinase-like xylanase inhibitor protein (clXIP) of fungal xylanases. Altogether, these responses suggest the existence of cross talk between abiotic and biotic pathways in roots of DT clone 14 which could explain its drought tolerance and resistance to several species of RKN of Meloidogyne spp. (see Sect. 4.3).

It is also worth noting that expression of many genes cited in this section (e.g., coding dehydrins, enzymes of carotenoid and RFO pathways, and other proteins involved in stabilization of membranes and proteins) was also studied during the last stages of coffee bean development (Hinniger et al. 2006; Simkin et al. 2010; Ivamoto et al. 2017a; Dussert et al. 2018), characterized by the intense dehydration of endosperm (De Castro and Marraccini 2006; Eira et al. 2006).

5.2 High Temperature

The study of Bardil et al. (2011) was the first to analyze the effects of low (LT, day 26°C/night 22°C) and high (HT, day 30°C/night 26°C) temperature on homeologous genes expressed in leaves of C. arabica and in those of its two ancestral parents, C. canephora and C. eugenioides. Among the 15 K unigenes analyzed, around 50% appeared differentially expressed (with 25% upregulated) at low temperature between C. arabica, C. canephora, and C. eugenioides. Similar proportions were found at high temperature when comparing the transcriptome of C. arabica vs. C. eugenioides and C. canephora vs. C. eugenioides. However, only 8.9% of transcriptome divergence was observed when comparing C. arabica vs. C. canephora. In terms of expression patterns observed in C. arabica, the number of genes with “C. canephora-like dominance” increased from 8–14% under LT (in the Java and T18141 cultivars) to 21–26% under HT conditions. In that case, it was worth noting that transcription profiles of T18141 (a cultivar recently introgressed with C. canephora genome) were more similar to that of C. canephora than that of the “pure” (non-introgressed recently) Java cultivar. Altogether, these results indicate that C. arabica mainly expressed genes from its CaCc sub-genome under hot temperatures.

In another work, Bertrand et al. (2015) analyzed gene expression profiles in leaves of C. arabica, C. eugenioides, and C. canephora (cv. Nemaya) exposed to four thermal regimes (TRs: 18–14, 23–19, 28–24, and 33–29°C). Under hot temperatures, upregulated expression in C. arabica was observed for several genes like Cc10_g00570 coding a catalase (CAT3) (when compared to C. canephora) and Cc06_g11950 coding a photosystem II subunit X (when compared to C. eugenioides). On the other hand, expression profiles of Cc05_g04680 coding a L-ascorbate oxidase homolog and those of photosynthetic genes coding light-harvesting complex (LHCII: Cc04_g16410) and chlorophyll a–b-binding protein (CAB: Cc10_g00140, Cc05_g12720, Cc09_g09020, Cc05_g09650, and Cc09_g09030), or for respiration-like genes Cc10_g00410, Cc02_g25840, and Cc07_g00550 (coding a chloroplast glyceraldehyde-3-phosphate dehydrogenase, a chloroplast ribose-phosphate pyrophosphokinase, and a Rubisco methyltransferase, respectively), were strongly downregulated in C. arabica compared with its two parents.

In leaves of C. arabica, heat-shock conditions also upregulated the expression of CaGolS1, CaGolS2, CaGolS3, CaPMI, CaMTD, and CaERF014 and downregulated expression of CaM6PR (dos Santos et al. 2011; de Carvalho et al. 2014; Torres et al. 2019). The interactions of high temperature and high [CO2] on expression profiles of gene coding protective and antioxidant proteins were also studied by Martins et al. (2016) and Scotti-Campos et al. (2019) (see Sect. 5.4 below).

5.3 Cold Stress

The first studies to analyze the effects of cold stress on coffee gene expression were realized by Fortunato et al. (2010) and Batista-Santos et al. (2011) who subjected several cultivars and hybrids of C. canephora, C. arabica, and C. dewevrei to gradual cold treatments. These authors showed that upregulation of CaGRed and CaDHAR genes (coding a glutathione reductase (GR) and dehydroascorbate reductase, respectively) and of CaCP22, CaPI, and CaCytf (coding proteins involved in PSII, PSI, and Cytb6/f complex, respectively) could explain the ability of Icatu (C. arabica × C. canephora) cultivar to better support cold stress by reinforcing its antioxidative capabilities and maintaining efficient thylakoid functioning.

In their analysis of gene expression profiles in leaves of C. arabica, C. eugenioides, and C. canephora (cv. Nemaya) exposed to different thermal regimes, Bertrand et al. (2015) also reported upregulated expression profiles under cold stress in C. arabica for Cc07_g15610 gene coding a L-ascorbate oxidase, for genes involved in respiration (e.g., Cc02_g08980, Cc00_g15710, and Cc02_g06960 coding a phosphoenolpyruvate carboxylase kinase, a ribulose bisphosphate carboxylase small chain, and a sedoheptulose-1,7-bisphosphatase, respectively), and also for genes of photosynthesis (e.g., Cc02_g28520, Cc05_g15930, Cc07_g10820, and Cc06_g19130 coding a ferredoxin-nitrite reductase, a photosystem II 10 kDa polypeptide, a ferredoxin-NADP reductase, and a ferredoxin-dependent glutamate synthase, respectively). On the other hand, overexpression of Cc05_g10250, Cc00_g35890, and Cc05_g10310 genes (coding polyphenol oxidases) was seen in C. canephora under low temperatures. For the LHY (late elongated hypocotyl, Cc02_g39990) gene involved in circadian cycle, RT-qPCR experiments confirmed in silico data, showing the highest expression under low than high temperatures particularly in leaves of C. canephora.

More recently, two studies investigated the effects of cold stress on the leaf gene expression in C. canephora. In the first one, Dong et al. (2019a) performed gene expression analyses in leaves of C. canephora plants subjected to cold stress (C1 (7 days at day 13°C/night 8°C) followed by C2 (3 days at day 4°C/night 4°C)) but also in fruits at different stages of development. For the 38 CcNAC genes analyzed by qPCR in cold-stressed leaves, expression was (1) upregulated upon C1 and C2 treatments for 4 genes, (2) downregulated upon C1 (but not C2) for 10 genes, (3) upregulated upon C2 (but not C1) for 7 genes, and (4) downregulated upon both cold treatments for 17 genes. In the second work, the same authors characterized 49 CcWRKY genes from the reference genome of C. canephora and analyzed their expression profiles by qPCR for 45 of them in cold-stressed leaves as reported in the previous study (Dong et al. 2019b). This led to identify 14 CcWRKY genes with expression induced during the cold acclimation stage (upon C1 and C2 treatments), 17 genes upregulated by cold treatment (C2 but not C1), and 12 downregulated by both cold stress treatments. Among the 14,513 putative target genes of CcWRKY identified in C. canephora by a genome-wide analysis, 235 were categorized into response to the cold process, including carbohydrate metabolic, lipid metabolic, and photosynthesis process-related genes. Like in many other plants, these observations clearly highlight the vital regulatory role played by WRKY TFs in various developmental and physiological processes (such as seed development) but also in a range of abiotic stress (like cold, heat, drought, as well as salinity) and biotic stress (Rushton et al. 2010).

In a more recent work, Ramalho et al. (2018a) analyzed the impacts of single and combined exposure to drought and cold stress in C. arabica cv. Icatu, C. canephora cv. Apoatã, and the hybrid C. arabica cv. Obatã. At the physiological level, the Icatu cultivar showed a lower impact upon exposure to cold and drought stress, characterized by a reduced lipoperoxidation under stress interaction, for example. At the molecular level, simultaneous exposure of Icatu to both stresses increases the expression of genes coding ascorbate peroxidase (APX) involved in H2O2 removal (e.g., APXc [cytosolic] and APXt+s [stromatic]) and consequently total APX enzymatic activity. To a lesser extent, this situation was also observed in C. canephora, while Obatã was the less responsive genotype considering the studied genes.

5.4 CO2 Concentration

The research group of J.C. Ramalho (Lisbon University, Portugal) published several articles studying the effects of elevated [CO2] on coffee. They demonstrated that elevated [CO2] mitigated the impact of heat on coffee physiology (Rodrigues et al. 2016) and also contributed to preserve the bean quality (Ramalho et al. 2018b). In a study aiming to analyze the interactions of elevated [CO2] and high temperature on protective response mechanisms in coffee, Martins et al. (2016) showed that the maintenance (or increase) of the pools of several protective molecules (e.g., neoxanthin, lutein, carotenes, α-tocopherol, heat-shock proteins HSP70, and raffinose), activities of antioxidant enzymes (e.g., superoxide dismutase, APX, GR, and catalase [CAT]), and the upregulated expression of ELIP (coding chloroplast early light-induced protein) and Chap20 (coding chloroplast 20 kDa chaperonin) genes were correlated with heat tolerance (up to day 37°C/night 30°C) at 380 and 700 μL CO2 L−1 for both C. arabica L. cvs. Icatu and IPR108 and C. canephora cv. Conilon clone 153. These authors also showed that upregulated expression of genes related to protective (ELIPS, HSP70, Chap20, and Chap60) and antioxidant (CAT, APXc, APXt+s) proteins was largely driven by temperature, while enhanced [CO2] promoted a greater upregulation of these genes mainly in C. canephora CL153 and C. arabica Icatu. In the more recent study analyzing the expression of genes related to lipid metabolism under elevated [CO2], heat, and their interaction, Scotti-Campos et al. (2019) showed that the strong remodeling (unsaturation degree) of membrane lipids observed during the heat shock (from day 37°C/night 30°C to day 42°C/night 34°C) of plants grown under high [CO2], coordinated with FAD3 (coding for fatty acid desaturase) downregulation in C. arabica and upregulation of lipoxygenase-coding genes LOX5A (in CL153 and Icatu) and LOX5B (in Icatu), could contribute to long-term acclimation of coffee chloroplast membranes to climate changes.

5.5 Salt Stress

In leaves of C. arabica cv. IAPAR59, upregulated expression of galactinol synthase genes CaGolS2 and CaGolS3 was observed after irrigation with 150 mM NaCl (dos Santos et al. 2011). In the same cultivar, salt stress upregulated the expression of CaM6PR and CaPMI genes and markedly downregulated that of CaMTD (de Carvalho et al. 2014). In parallel, leaf mannitol contents increased gradually to reach a peak after 12 days of salt stress imposition. However, this content was lower than in leaves of plants under water deprivation, indicating that coffee plants have different responses to drought and salinity.

The effects of salt stress in leaves were recently studied by RNAseq in leaves of C. arabica seedlings irrigated with normal water (control, ECw [electrical conductivity] = 0.2 dS.m−1) or with deep sea water (salt treatment, ECw = 2.3 dS.m−1) (Haile and Kang 2018). From the 19,581 genes aligned on the reference genome of C. canephora, in silico analyses identified 611 genes presenting significant DEG profiles between the control and salt treatment. Among the most expressed upregulated genes were Cc00_g13890, Cc04_g05080, and Cc08_g11060, coding for WRKY TFs; Cc06_g01240 coding a putative trihelix TF GT-3a already reported in controlling the developmental process and response to abiotic and biotic stress (Park et al. 2004; Wang et al. 2016); and Cc10_g04710 coding the putative ethylene-responsive (ERF011) TF. On the other hand, salt stress also downregulated the expression of Cc05_g16570 (coding a putative MYB family transcription factor APL), Cc02_g17440 and Cc07_g03240 (both coding putative bHLH TFs), and Cc02_g10740 and Cc06_g21410 (coding putative transcription elongation factor SPT of RNA polymerase II). However, the DEG expression profiles of these TF-encoding genes were not verified in vivo by qPCR experiments.

5.6 Wounding

WRKY and NDR genes were previously reported as playing key roles in the molecular resistance responses of coffee to H. vastatrix (see Sects. 2.3 and 4.1). In the first study, Ganesh et al. (2006) reported upregulated expression of CaNDR1, CaWRKY1 (see Sect. 4.1), and CaR111 (coding a putative protein of unknown function) genes in leaves of C. arabica wounded by performing transversal cuts with scissors. Few years after, Petitot et al. (2008, 2013) showed that expression of both CaWRKY1a (CaCc) and CaWRKY1b (CaCe) homeologs was upregulated in wounded leaves of C. arabica (see Sect. 7). In parallel, wounding also markedly upregulated expression of CaWRKY1a and CaWRKY1b genes in leaves of C. canephora and C. eugenioides, respectively, confirming that both genes were functional. In addition to CaWRKY1, Ramiro et al. (2010) also showed that CaWRKY19/20/21 genes, as well as CaWRKY15 and CaWRKY17, were also highly induced by wounding. In another work, Brandalise et al. (2009) showed that expression of CaIRL, coding an isoflavone reductase-like protein, was induced in leaves of C. arabica submitted to a mechanical injury, leading to further study the promoter of this gene (see Sect. 8).

6 Gene Expression in F1 Hybrids of C. arabica

In the context, the creation of new coffee varieties better adapted to biotic and abiotic stresses to low levels of inputs and to CC is now one of the challenges of several coffee research institutes (van der Vossen et al. 2015; Bertrand et al. 2019).

In C. arabica, it is possible to create and select in a relatively short time (e.g., around 8 years against 25 years for conventional breeding programs) new F1 hybrid varieties with increased production (e.g., under agroforestry) and also improved aromatic quality without increasing fertilizer quantities (Bertrand et al. 2006, 2011), by crossing pure commercial line varieties with phylogenetically distant plants corresponding to wild individuals from Ethiopia and Sudan, for example (Van der Vossen et al. 2015). The objective of the H2020 BREEDCAFSFootnote 2 (BREEDing Coffee for Agroforestry Systems) project, supported by the EU (2017–2021), is to identify robust markers (allelic, molecular, epigenetic) that could be used as early predictors to speed up future C. arabica breeding programs aiming to create new F1 hybrids with increased resistance and greater resilience to climate change in agroforestry systems (Bertrand et al. 2019). This project intends to compare the leaf transcriptomic profiles in F1 hybrids and cultivated varieties (and/or hybrids to their two parents) upon different abiotic stresses either performed in phytotrons and greenhouses (e.g., in order to test the effects of temperature, light, drought, CO2, and N2) or in field trials (or in networks of “demoplots” in farms). The numerous RNAseq studies planned to be perform within the framework of this project (Table 5) should also help us to better understand why the pure line varieties are less adapted to environmental constraints than F1 hybrids. For example, Toniutti et al. (2019b) showed that hybrid vigor (heterosis) could be explained by the modification of leaf expression profiles of several genes involved in the circadian clock (e.g., LHY and GIGANTEA), the chlorophyll synthesis (e.g., POR1A and POR1B), and starch degradation (e.g., CcGWD1 and CcISA3) in leaves of the C. arabica F1 hybrid GPFA124 compared to those of the inbred Caturra line. In the same work, upregulated expression of chloroplast genes in the C. arabica GPFA124 was also reported (see Sect. 7).

Table 5 List of experiments (and related RNAseq analyses) planned in the frame of the BREEDCAFS project (see www.breedcafs.eu)

7 Expression of Chloroplast Genes

The chloroplast genome of C. arabica consists of 155,189 base pairs encoding 130 genes with 18 intron-containing genes (Samson et al. 2007). In a pioneer work, Dinh et al. (2016) analyzed the effects of drought, cold, or combined drought and heat stresses on intron splicing and expression patterns of 48 chloroplast genes from C. arabica. By RT-qPCR, these authors showed that the transcript levels of chloroplast mRNAs were globally decreased in seedlings submitted to drought or cold treatments. For example, expression of rbcL (coding the large subunit of Rubisco) and psaA and psaB (coding photosystem I proteins) was significantly reduced in C. arabica under cold stress conditions but not under drought. Regarding intron-containing genes, it was also shown that the splicing efficiencies of trnG, trnK, and trnA genes increased upon drought, combined drought and heat, or cold stress treatments, while these efficiencies decreased for trnL under these stresses. On the other hand, the splicing efficiencies of mRNA genes rps16, atpF, petB, and rpl2 were decreased upon drought but increased upon cold stress treatment.

Overexpression of CaPsbB gene (coding the photosystem II CP47 chlorophyll apoproteins) was also reported, either by in silico (Vieira et al. 2006; Mondego et al. 2011; Vinecky et al. 2012) or by in vivo (Mofatto et al. 2016) analyses, in leaves of drought-stressed coffee plants but also in those infected by H. vastatrix (Fernandez et al. 2012).

In addition to the circadian genes (see Sect. 6), Toniutti et al. (2019b) also reported increased photosynthetic electron transport efficiency in the C. arabica hybrid GPFA124 probably explained by higher expression of chloroplast genes CaPsbA and CaPsbD (coding the D1 and D2 proteins of PSII, respectively); CaPetA, CaPetD, and CaPetB (coding proteins of the cytB6/f complex); and CaPsaA, CaPsaB, and CaPsaJ (coding proteins of PSI), in this hybrid compared to the C. arabica cv. Caturra.

8 Coffee Promoters

The expression studies previously detailed also led to the identification of coffee promoters (De Almeida et al. 2008). For several of them, they were functionally characterized using the uidA (coding the β-glucuronidase) as the reporter gene by transgenic approaches either in Nicotiana tabacum or in Coffea sp. The first promoter was cloned from the CaCSP1 gene of C. arabica coding for the 11S seed storage protein and was shown to function as a bean (endosperm)-specific promoter in transgenic tobacco plants (Marraccini et al. 1999). A similar result was also observed for the shorter and medium promoter fragments of the CaLTP gene coding non-specific lipid transfer proteins (Cotta et al. 2014). Leaf-specific expression was also reported for CaRBCS1 and CcMXMT1 coffee promoters in transgenic tobacco (Marraccini et al. 2003; Satyanarayana et al. 2005). The SERK1 (somatic embryogenesis receptor-like kinase 1) promoter from C. canephora was also shown to drive the uidA expression in different embryo structures such as globular, heart, torpedo, and cotyledonal embryos present at 60 days after embryogenic induction (Jiménez-Guillen et al. 2018). Regarding abiotic stress, Brandalise et al. (2009) showed that the promoter of CaIRL was induced by wounded leaves of N. tabacum. In 2016, Nobres et al. analyzed the promoter function of the CaHB12 from C. arabica, a gene coding member of the homeodomain-leucine zipper I subfamily (HD-Zip) and conferring greater tolerance to drought stress when overexpressed in Arabidopsis (Alves-Ferreira et al. 2012). The study of transgenic A. thaliana plants bearing pCaHB12::GUS constructs showed that this promoter was expressed in leaves during drought and in roots after polyethylene glycol or mannitol treatments. On the other hand, the different haplotypes of the CcDREB1D promoter from C. canephora were shown to be upregulated by different abiotic stresses in the leaves of C. arabica (see Sect. 5.1) and N. tabacum transgenic plants (Alves et al. 2017, 2018; de Aquino et al. 2018). Regarding biotic stress, Petitot et al. (2013) analyzed the promoter activities of CaWRKY1a (named pW1a) and CaWRKY1b (named pW1b) homeologous genes, previously identified to be induced by CLR infestation in the C. arabica leaves (see Sects. 2.3, 4.1, and 5.6), in transient assays of N. benthamiana leaves, and in stable transgenic plants of C. arabica. These authors also showed increased activities of both promoters in leaves of tobacco treated with SA or in those of coffee infected with CLR, as well as increased activities of pW1a upon wounding. The other coffee promoters already described in the literature but without being tested in transgenic plants are cited in Table 6.

Table 6 List of coffee promoters already described in the literature

9 Coffee Small RNA (sRNA)

Using small (20 ± 26 nt) homologous sequences, small RNAs (sRNA) are known to play important roles by silencing pathways at the transcriptional or translational levels. Plant sRNAs are classified as (1) microRNAs (miRNAs) which are derived from self-complementary hairpin structures and (2) small interfering RNAs (siRNAs) which are derived from double-stranded RNA (dsRNA) or hairpin precursors (Borges and Martienssen 2015). The core mechanism of sRNA production requires the endonuclease activity of DICER-LIKE 1 (DCL1) and ARGONAUTE (AGO) proteins as effectors of silencing, while siRNA biogenesis involves action of RNA-dependent RNA polymerase (RDR), Pol IV, and Pol V. With the release of the C. canephora genome (Denoeud et al. 2014), sRNAs were now identified.

One of the first attempts to study coffee miRNAs was performed by Nellikunnumal and Chandrashekar (2012) who identified 18 miRNAs, belonging to 12 families, from C. canephora ESTs by computational approaches. By RT-PCR, these authors showed that expression was detected for seven families (viz., mir156, mir169, mir172, mir319, mir393, mir395, and mir396) in C. canephora leaves. By the same computational approach, Rebijith et al. (2013), Loss-Morais et al. (2014), and Devi et al. (2016) also identified miRNAs in C. arabica and C. canephora, showing that the majority of their potential targets corresponded to mRNA coding proteins involved in transcriptional regulation and signal transduction pathways. In another study, Akter et al. (2014) identified a potential miRNA (named mir393) from C. arabica ESTs and also showed that this sequence had as potential targets several genes coding transcription factors (e.g., bHLH7 and WRKY TFs) or proteins involved in auxin signaling pathway and plant defense responses (e.g., auxin signaling F-box 2 and auxin transporter protein 1). Using a specific pipeline to search for miRNA homologs on expressed sequence tag (EST) and genome survey sequence (GSS) coffee databases, Chaves et al. (2015) identified 36 microRNAs and a total of 616 and 362 potential target genes for C. arabica and C. canephora, respectively. Using a stem-loop RT-PCR assay, these authors also detected a higher amount of miRNAs (miRNAs 171, 172, 390, and 167) in leaves of C. arabica than in those of C. canephora, suggesting a possible role of sRNA in regulating C. arabica transcriptome.

Fernandes-Brum et al. (2017b) identified 11 AGO proteins, nine DCL-like proteins, eight RDR proteins, and 48 other proteins implicated in the sRNA pathways. These authors also identified (1) 235 miRNA precursors producing 317 mature miRNAs belonging to 113 MIR families and (2) 2239 putative C. canephora miRNA targets in different pathways. In another study, Bibi et al. (2017) also identified potential miRNAs potentially targeting 150 genes coding transcription factors but also proteins involved in multiple biological and metabolic processes, hypothetical proteins, signal transduction, transporters, growth and development, stress-related processes, structural constituents, and disease-related processes, for example.

In the study analyzing coffee memory to multiple drought exposures, de Freitas Guedes et al. (2018) also reported upregulated expression of mir398 and mir408 by the drought cycles in C. canephora. In addition to drought, these genes were also reported to be regulated in other plants by ABA, heat, UV, and also biotic stress events (Zhu et al. 2011; Khraiwesh et al. 2012; Guan et al. 2013). Interestingly, transgenic chickpea plants overexpressing mir408 were shown to be tolerant to several stresses including drought (Hajyzadeh et al. 2015; Ma et al. 2015). In the recent study, dos Santos et al. (2019) analyzed the transcriptome in N-starved roots of C. arabica and also identified 86 microRNA families targeting 253 genes. RT-qPCR assays showed that expression profiles of mir169, mir171, mir167, mir393, and mir858 were upregulated in roots after N-starvation, while mircar1 was downregulated after prolonged N-restriction. Altogether, these results highlight the role that might play sRNA in modulating the expression of genes involved in the adaptive responses of coffee plants to environmental factors.

10 Conclusions

Like many other crops, gene identification and characterization are of fundamental biological interest in coffee to understand the transcription networks involved in important agronomic traits and further to identify SNPs that can serve as markers of specific phenotypes to better drive future breeding programs. In that way, the high number of large-scale expression analyses, together with the recent access to long-read sequencing of transcripts (Cheng et al. 2017), to reference transcriptomes (Yuyama et al. 2016; Cheng et al. 2018), and to reference genomes of C. canephora (Denoeud et al. 2014) and C. arabica (de Kochko et al. 2015, 2017; Gaitan et al. 2015; Morgante et al. 2015; Yepes et al. 2016), now opens the way to identify SNPs associated with bean biochemical compound content (Tran et al. 2018) and adaptation to environmental factors (de Aquino et al. 2019) and to initiate marker-assisted selection (Alkimim et al. 2017) and genome-wide association studies (Andrade 2018; Sant’Ana et al. (2018); Carneiro et al. 2019). Together with the help of CRISPR/Cas9 technology (Breitler et al. 2018), it is now possible to greatly shorten the time required to create new coffee varieties with improved agronomic traits under CC.