Introduction

Cork oak (Quercus suber L.) is one of the most important Mediterranean forest tree species. Native to dry and semi-arid regions, it plays an important ecological role as environmental protector since it is one of the critical species of the montado ecosystem contributing to soil conservation as well as flora and fauna biodiversity. Socially, cork oak stands to to underpin the basis of a sustainable activity, providing a source of income among rural populations across the Mediterranean basin due to the economic value of cork.

In the last two decades, climate change, exotic pathogens, and pest translocations from ecosystems separated in the past have caused extensive decline and mortality of several trees species worldwide (Allen et al. 2010). One of the main tree genera affected by this event is Quercus (Braisier 1996). In the Iberian Peninsula, a severe decline affecting European oak forests has been well reported since the 1980s (Brasier 1992; Braisier 1996), in which cork oak is largely representative. The use of natural regeneration seeds for propagation does not allow the rescue of valuable phenotypes (Valladares et al. 2004), only identified in adult trees. Since the capacity for vegetative propagation of Quercus genera through traditional methods is very low (Vieitez et al. 2012), somatic embryogenesis emerges as a valuable tool for integration in breeding programs. Induction of Q. suber somatic embryogenesis has been successfully achieved by distinct research groups (Bueno et al. 2000; Pinto et al. 2002; Pintos et al. 2008; Vieitez et al. 2012). The possibility to modulate somatic embryogenesis and embryo development in cork oak allows the storage of cork oak embryos for several years, which is difficult owing to recalcitrance of the acorns. In addition, it opens the possibility to perform functional genomic studies such as genetic transformation for herbicide resistance (Álvarez et al. 2009), gene silencing (Zhang et al. 2012), and overexpression (Mallón et al. 2014) or even precise genome editing by new emerged techniques like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) system (Fan et al. 2015).

Somatic embryos follow similar developmental patterns as their zygotic counterparts. Embryo development and maturation requires the concerted action of several signaling pathways integrating genetic, epigenetic, and hormonal regulation (Gutierrez et al. 2007; Feng et al. 2010; Gao et al. 2012). Phytohormonal homeostasis modulates embryo development involving several types of hormones including abscisic acid (ABA), auxin (AUX), brassinosteroid (BR), cytokinin (CK), ethylene (ET), gibberellin (GA), jasmonic acid (JA), and salicylic acid (SA) modulating embryo development.

Embryo development proceeds through a series of spatially and temporally regulated gene expression networks from which many genes have already been characterized (Gao et al. 2012). In woody species, recent advances provided insight into mechanisms involved in transition from morphologically mature to physiologically mature somatic embryo namely in Picea asperata (Jing et al. 2017). In Citrus sinensis, small RNAs and degradome high-throughput sequencing revealed a group of miRNAs/siRNA as well as their targets differentially regulating specific biological process and transcription factor expression (Wu et al. 2015).

An effort to explain key regulatory processes involved in distinct developmental stages of Q. suber somatic embryogenesis has been made by a proteomic approach (Gomez-Garay et al. 2013). However, precise molecular mechanisms involved in somatic embryo induction, maturation, and germination in this species are still poorly known.

An assessment of the transcriptome of embryo development will contribute to the comprehension of genes involved in this process and may help to improve the process of in vitro embryo production. Identification of gene expression patterns associated with specific stages of embryo development is critical to understand the molecular and biochemical basis characteristic of each specific stage of embryo development. Knowledge generated by such studies will contribute to and enhance the quality of mature embryos for use in forestation processes.

Next-generation sequencing applied to transcriptomics enables the unraveling of developmental transcriptome dynamics. RNA-Seq and the quantification of differentially expressed genes not only describes genes being expressed in a particular stage of development but also quantifies the expression levels of each transcript. In this context, we have analyzed the transcriptome of Q. suber somatic embryo in four distinct embryo developmental stages, from globular to mature cotyledonary. With this approach, we expected to identify potential important genes in Q. suber somatic embryo development and maturation, focusing specifically on transcripts coding for transcription factors, transcription regulators, and chromatin regulators, as well as transcripts related to hormone biosynthesis, metabolism, transport, and signaling.

Materials and methods

Sample preparation and RNA sequencing

Branches up to 5 cm in diameter were collected from one adult cork oak tree growing in Cercal do Alentejo, Portugal, in April 2011. Segmented branches without lateral branches and leaves, 20 cm in length and between 2 and 4 cm in diameter, were forced to sprout in a fitoclima S600PLH (Aralab) at 25 ± 1 °C, with a 16-h photoperiod provided by cool-white fluorescent tubes (200 μmol m−2 s−1) and 90–100% relative humidity. Expanding leaves with 1–2 cm from the base to the apex were excised from epicormic shoots and used as initial explants for somatic embryogenesis induction, according to Álvarez et al. (2004). Quercus suber embryogenic clusters were grown in MSSH medium, containing MS micronutrients (Murashige and Skoog 1962) and SH macronutrients (Schenk and Hildebrandt 1972) and supplemented with MS vitamins (Murashige and Skoog 1962), on a climatic chamber at 25 °C under a light/dark cycle of 16/8 h. Somatic embryogenic clusters were subcultured every 30 days. After 4 years in culture, embryos were selected and pooled according to developmental stages. We selected four distinct Q. suber embryo developmental stages (Fig. 1a): (1) the globular stage (ST1), in which embryos start to emerge from the pro embryogenic masses acquiring a radial symmetrical morphology; (2) the heart/torpedo stage (ST2), where emergence of the first discernible organs occurs, the two cotyledons, and a change in embryo morphology from radial to bilateral symmetry is visible; (3) the immature cotyledonary stage (ST3) where the cotyledons are well defined and most embryogenic tissue and organs are formed; (4) and the mature cotyledonary stage (ST4), in which embryos pass through several morphological and biochemical changes, namely the expanding of the storage organs, the cotyledons, repression of germination, and acquisition of desiccation tolerance (Von Arnold et al. 2002).

Fig. 1
figure 1

Morphology of cork oak somatic embryo in the different developmental stages used for RNA-Seq analysis. Somatic embryo at the globular (ST1) (a), heart/torpedo-shaped (ST2) (b), early cotyledonary (ST3) (c), and mature cotyledonary (ST4) (d) stages. Bars = 0.5 mm in (a, b, c), 0.5 cm in (d)

Embryos from each distinct stage (20–40 fresh embryos) were pooled together and ground with a mortar in the presence of liquid nitrogen for RNA extraction. Total RNA was extracted from the four embryo stages described above using the AMBION RNAqueous® Total RNA Isolation kit (Thermo Fisher Scientific, Waltham, MA, USA) including Plant RNA Isolation Aid solution (Thermo Fisher Scientific, Waltham, MA, USA), according to the manufacturer’s instructions. RNA samples were treated with AMBION TURBO DNA-free™ Kit (Thermo Fisher Scientific, Waltham, MA, USA) for DNA removal. RNA samples were analyzed by electrophoretic separation of nucleic acids in a Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA) and used for cDNA synthesis and library construction for sequencing when RIN > 8.

RNA-seq library preparation and sequencing was performed by the Beijing Genomics Institute BGI (Hong-Kong, China). Paired-end libraries were prepared following the protocol of Illumina® TruSeq RNA Library Preparation kit and sequenced in one lane using the Illumina HiSeq™ 2000 platform, with a read length of 100 bp. Raw reads were cleaned by removing adaptor sequences, empty reads, and low-quality sequences.

Transcriptome data assembly and annotation

Illumina paired-end reads were pre-processed according to quality (minimum of 20) and length (minimum of 80) using a custom Perl script combined with Sickle (Joshi and Fass 2011) in order to remove or trim from the dataset reads with low average quality and reads containing undetermined nucleotides (N’s). Pre-processed reads were assembled with the Trinity platform (Grabherr et al. 2011). Assembly statistics were generated by quast 2.3 (Gurevich et al. 2013). Protein sequence prediction was performed with Transdecoder (http://transdecoder.github.io) from the transcriptome assembly, which extracts the longest open reading frames (ORFs) and identifies ORFs with homology to known proteins using Blastp and hmmer against Uniprot and Pfam databases producing a set of candidate coding regions.

Pre-processed paired-end reads were aligned to the assembled contigs using BWA (Li and Durbin 2010). Mapped reads with a mapping quality (MAPQ) value ≥ 10, a single best hit (X0:i:1), and no suboptimal or alternative hits (X1:i:0) were kept, selected with Samtools (Li et al. 2009), and considered as unique mapped reads. Digital differential expression analysis was performed using edgeR (Robinson et al. 2009), examining differential expression (DE) from a table of read counts generated by unique mapped reads through the featureCounts R function. A common dispersion value was used considering the biological coefficient of variation (BCV) to be 0.1, which is a typical value reported for RNA-Seq next-generation sequencing data studies involving genetically identical organisms arising from controlled experiments (McCarthy et al. 2012). Genes with very low counts across the dataset of predicted genes were discarded to avoid interferences with statistical analysis. After DE analysis, correction for multiple testing was performed by applying a cutoff ≤ 0.01 FDR (false discovery rate), to produce a final list of differentially expressed genes (DEGs) and digital gene expression values in counts per million (CPM).

The transcriptome assembly and the whole set of DEGs were functionally annotated using Blastx against the non-redundant (nr) protein plant sequence database from NCBI. Results were loaded into Blast2GO (Conesa and Gotz 2008) to perform a protein domain and gene ontology (GO) annotation using the InterProScan (Zdobnov and Apweiler 2001; Jones et al. 2014) function. ANNEX annotation (annotation argumentation) function was used to refine annotations (Conesa and Gotz 2008) and GOSlim (plants) was used to generate plant specific GO terms. Results were graphically represented with Cytoscape (Shannon et al. 2003).

Overrepresented GO terms of differentially expressed genes were identified with BiNGO (Maere et al. 2005) (Cytoscape plugin) using as a reference a custom GO annotation of the assembled sequences generated and annotated with Blast2GO. From those results, an enrichment analysis was performed. GO terms with p values < 0.05 were considered to be significantly different and enriched in our sets of DEGs.

Gene clustering analysis and differential expression profiling

Hierarchical clustering was performed using MultiExperiment Viewer (MeV v4.9.0) (Saeed et al. 2006) with the Euclidean distance and average linkage model. K-means clustering was done with the Euclidean distance to produce distinct expression profile clusters. A log2 transformation was applied to all expression values before the clustering analysis.

Differential gene expression validation by RT-qPCR

A set of 19 DEGs was analyzed by reverse transcription quantitative real-time PCR (RT-qPCR) in order to validate expression profiles obtained in bioinformatics analyses (Online Resource_1). Two micrograms of total RNA previously extracted for RNA-Seq were used for cDNA synthesis using the QuantiTect Reverse Transcription kit (Qiagen, Valencia, CA, USA), which includes an additional genomic DNA elimination step and uses a mix of oligo (dT) and random hexamer primers. Specific primers were designed by Primer3Plus (Untergasser et al. 2007). RT-qPCR experiments were carried out in an iCycler iQ5 Instrument (Bio-Rad Laboratories, Hercules, CA, USA) using the Sso Advanced Universal SYBR Green Master mix (Bio-Rad Laboratories, Hercules, CA, USA) in 96-well plates. Three replicates were performed in reaction mixtures of 20 μL containing 10 μL of Master mix, 400 nM of each specific primer pair (forward/reverse) and 1 μL of cDNA with a dilution of 1:100 as template. All selected genes were amplified with the following PCR program with a single fluorescent reading taken at the end of each cycle: 95 °C at 10 min, 45 cycles of 10 s at 95 °C, 15 s at 60 °C, 20 s at 60 °C, and 15 s at 72 °C, except for AINTEGUMENTA (ANT), NUCLEAR CAP-BINDING PROTEIN SUBUNIT 1 (ABH1), and TRANSLATIONALLY CONTROLLED TUMOR PROTEIN 1 (TCPC1) for which the annealing temperature was 61 °C. To distinguish specific from nonspecific products and primer dimers, a melting curve was obtained immediately after amplification. Normalization was performed using two reference genes: ACTIN (ACT) and CLATHRIN ADAPTOR COMPLEXES (CACs). Normalized relative quantities (NRQ) were calculated by \( NRQ=\frac{E_{goi}^{\Delta Ct, goi}}{\sqrt[f]{\prod_o^f{E}_{ref_o}^{\Delta Ct,\kern0.5em {ref}_o}}} \) where E is the amplification efficiency for each primer pair, f the number of reference genes used to normalize data, goi the gene of interest, ref the reference gene and ∆Ct is the Ct of the sample with higher Ct across samples minus the Ct value of the sample in test (Hellemans et al. 2007). Expression values from RNA-Seq experiments were calculated by the edgeR package. Both RNA-Seq and RT-qPCR gene expression values were submitted to log2 transformation and compared by Pearson correlation.

Gene annotation

To identify genes encoding for transcription factors (TFs), transcription regulators (TRs), and chromatin regulators (CRs), all DEGs were analyzed with the PlantTFcat tool (Dai et al. 2013). For analysis and classification of homeostasis-related phytohormones and genes involved in germination, a Blastx against the Arabidopsis Hormone Database (AHD2.0, http://ahd.cbi.pku.edu.cn/) (Jiang et al. 2011) was performed with all DEGs (E value ≤ 10−10, % identity > 60, query coverage > 70) producing a list of genes. Other embryogenesis-related genes were manually identified searching for genes annotated with “embryo” and “embryogenesis” terms.

Data availability

Raw reads of transcriptomic data have been deposited to the NCBI Short Read Archive (SRA) under accessions numbers SAMN05898489, SAMN05898490, SAMN05898491, and SAMN05898492. The four records correspond to Quercus suber somatic embryo development Biosamples stage 1 (SAMN05898489), stage 2 (SAMN05898490), stage 3 (SAMN05898491), and stage 4 (SAMN05898492).

Results

RNA-Seq analysis and global functional annotation

A de novo transcriptome sequence assembly was generated from a genotype embryo cell line to study Q. suber somatic embryo development and maturation. Sequencing of cDNA libraries of four different embryo developmental stages generated 160,366,471 paired-end reads, of which 156,088,387 were kept after pre-processing. The transcriptome assembly resulted in 143,960 contigs ranging from 500 to 16,793 bp and a N50 length of 1734 bases (Table 1). A total of 66,693 candidate coding regions within the assembled transcriptome were identified, from which 22,109 were co-expressed among the four embryo developmental stages (Fig. 2a). In addition, 671, 811, 1014, and 1480 genes were found exclusively present in ST1, ST2, ST3, and ST4, respectively. Also, there were 373 genes uniquely present in ST1 and ST2, 786 genes unique to ST2 and ST3 stages and 1088 genes that were expressed only in ST3 and ST4.

Table 1 Statistics of Quercus suber somatic embryo assembled transcriptome
Fig. 2
figure 2

Venn diagram of all expressed genes between the four developmental stages (top). Hierarchical clustering of DE genes among libraries (bottom). Green indicates downregulated and red indicates upregulated genes

After normalization and log2 transformation, a heatmap was generated to show a global overview of gene expression, during somatic embryo development (Fig. 2b). Hierarchical clustering showed that ST2 and ST3 grouped closer to ST1 while ST4 clustered in a different group (Fig. 2b). Moreover, multiple gene groups were clustered together showing similar expression patterns across developmental stages.

Functional categories were assigned to all genes in terms of GO (Fig. 3). A total of 29,725 genes (44.57%) were assigned to one or more plant GOSlim categories. From those, 18,360 (61.77%) were involved in biological process, 24,712 (83.14%) with molecular functions, and 14,088 (47.40%) with cellular components. Also, a total of 8558 (28.80%) genes were annotated to three main GO categories. Regarding biological processes, the main represented GO categories at level 2 are cellular process (31.73%) and metabolic process (28.28%). Other well-represented biological processes such as single-organism process (8.77%) and response to stimulus (5.68%) were also identified (Fig. 3). In terms of molecular function categories, GO terms at level 3 associated with the annotated genes reveal that heterocyclic compound binding (23.08%), organic cyclic compound binding (23.08%), transferase activity (17.11%), small molecule binding (16.34%), and hydrolase activity (13.46%) are the most represented GO terms (Fig. 3). Cellular component-associated GOs at level 4 reveal that most of the genes belong to intracellular membrane-bounded organelle (28.75%), cytoplasm (28.28%), and cytoplasmic part (23.73%) components (Fig. 3).

Fig. 3
figure 3

Gene Ontology functional classification of all expressed genes on biological process at level 2, molecular function at level 3 and cellular localization at level 4

Gene expression and differential gene expression analysis

Unique mapped reads of each cDNA library were used for gene expression analysis by the featureCounts R function. The analysis identified 11,507 DEGs (FDR < 0.01) between all developmental stages. The genes were differentially expressed when compared: ST1 vs ST2 (ST12), ST2 vs ST3 (ST23), ST3 vs ST4 (ST34), and non-sequential development stages (Fig. 4a). Smear plots of log (fold change) vs average log (count per million) and respective volcano plots show the distribution of DEGs among sequential developmental stages (Online Resource_2). In ST12, the analysis identified 2995 upregulated and 2473 downregulated genes (Fig. 4b). In ST23, 2265 and 2420 up- and downregulated genes were found, respectively (Fig. 4b). Finally, in ST34, the total number of DEGs is higher when compared with that of ST12 and ST23, accounting 3653 genes upregulated and 3164 downregulated (Fig. 4b).

Fig. 4
figure 4

Downregulated and upregulated genes (a) and Venn diagram showing the differentially expressed genes among three comparisons of developmental stages pairs (b)

In order to understand patterns of gene expression during somatic embryo development, a hierarchical clustering analysis with the K-means method using Euclidean distance after log2 expression values transformation was performed. K-means clustering revealed 30 clusters with different expression patterns containing between 121 and 743 significantly regulated genes for the four developmental stages (Online Resource_3).

The confirmation of the expression levels obtained by the bioinformatics analyses was made by RT-qPCR of 19 genes related with embryogenesis and with distinct profiles across somatic embryo developmental stages (Fig. 5). Correlation of gene expression levels obtained by RT-qPCR and Illumina sequencing was demonstrated by Pearson’s correlation in which the majority of the genes showed a high correlation level: r > 0.9. A moderate correlation was also observed (0.6 > r > 0.9) for six of the selected genes.

Fig. 5
figure 5

Validation of RNA-Seq transcript expression profiles. The comparison of RNA-Seq and RT-qPCR data assays was calculated with the Pearson correlation and expressed by the r value. In the y-axis is represented the log2 of the relative expression level values from RNA-Seq data (right y-axis scale) and RT-qPCR (left y-axis scale) in each developmental stage

The GO annotation for the 11,507 DEGs revealed that cellular and metabolic processes are the main categories represented in biological processes, each accounting for 31.51% and 28.31%, respectively. Genes involved in other important biological processes such as single-organism processes (8.95%), response to stimulus (6.60%), cellular component organization (4.57%), localization (3.99%), multicellular organism processes (3.38%), and developmental processes (3.33%) were also identified (Online Resource_4). Regarding molecular function categories, GO terms for the DEGs are mainly related with binding and catalytic activities. In the catalytic subset, the main groups represented were heterocyclic compound binding (21.71%), organic cyclic compound binding (21.71%), small molecule binding (15.05%), protein binding (4.09%), and carbohydrate binding (1.34%). In the subset of catalytic activities, two main groups were represented: transferase activity (18.63%) and hydrolase activity (13.67%). Transcription factor activity (1.30%) and signal transducer activity (0.80%) were also represented (Online Resource_4). In terms of cellular components, identified genes are mainly related to intracellular membrane-bounded organelle (29.07%), cytoplasm (27.41%), cytoplasmic part (23.27%), intracellular non-membrane-bounded organelle (4.12%), thylakoid (3.76%), and cell wall (2.53%) (Online Resource_4). Data from all GOSlim-annotated DE genes are listed in Online Resource_5. In order to understand the enrichment occurrence of overrepresented biological process in DEGs clusters defined in K-means hierarchical analysis, we compared gene clusters proportion with its transcriptome assembly occurrence. This analysis showed that significantly enriched GO ontologies were generation of precursor metabolites and energy (clusters 5, 8, and 25); secondary metabolic processes (cluster 8 and 11); cell cycle (cluster 15 and 22); photosynthesis (cluster 5 and 8); DNA metabolic processes (cluster 15); translation (cluster 16); cellular component organization; DNA metabolic processes, embryo development, and epigenetic regulation of gene expression (all at cluster 22); cellular protein modification process and carbohydrate metabolic process (cluster 28); and response to biotic and external stimulus (cluster 29) (Fig. 6). These results give a broad view of overrepresented biological processes involved in some gene clusters, allowing monitoring of gene expression patterns with complementary association to biological processes, molecular function, and cellular components.

Fig. 6
figure 6

Biological process network of GO term enrichment for DE genes in K-means clusters with over-represented GO terms. The node size represents the number of genes associated to a given GO term and node fill color reflects the adjusted P value

Transcription factors, transcription regulators, and chromatin regulator genes involved in cork oak somatic embryogenesis

In order to identify and categorize TF, TR, and CR genes related with somatic embryo developmental stages, a list of plant TF, TR, and CR genes was generated from libraries ST1, ST2, ST3, and ST4. The analysis identified 2674 TF, TR, and CR genes from which 1159 were differentially expressed among developmental stages (Online Resource_6). The most predominant family type of regulators is TFs (76%), followed by transcription factor iterators (8%) and CRs (4%). Among TFs, the most abundant belongs to C2H2, MYB-HB-like, WD40-like, AP2-EREBP, bHLH, WRKY, NAM, and bZIP families of TFs, to the CCHC (Zn) family of TRs and to the PHD family of CRs. GO annotation enrichment analysis was performed to identify overrepresented biological processes related with the identified differentially expressed regulators. The most relevant GO terms were associated with response to endogenous stimulus, post-embryonic development, signal transduction, response to abiotic stimulus, cell communication, flower development, cell differentiation, and anatomical structure morphogenesis (Fig. 7).

Fig. 7
figure 7

Biological process network of GO term enrichment of DE transcription factors. The node size represents the number of genes associated to a given GO term and node fill color reflects the adjusted P value

A search for genes involved in embryo development and maturation was made taking into account blast similarity and gene function in Arabidopsis thaliana. AP2-EREBP TFs with high homology for AINTEGUMENTA-like (AIL) gene were identified, namely homologs of AINTEGUMENTA-LIKE 5 (AIL5) and AINTEGUMENTA-LIKE 6 (AIL6). Interestingly, a homolog of PLETHORA 2 (PLT2) which is also an AP2-EREBP transcription factor is upregulated in ST1, being 25–27 times more expressed than that in ST2 and ST3, and a homolog of the AINTEGUMENTA (ANT) gene was found to be upregulated in ST3. With an opposite expression pattern a CYTOKININ RESPONSE FACTOR 3 (CRF3) homolog was identified as downregulated in the ST3 stage. We also found a homolog of GATA transcription factor 8 (GATA8) expressed in all developmental stages, with a particularly higher expression in ST2 and ST3.

AUXIN RESPONSE FACTORs (ARFs) are TFs, which control auxin-regulated gene transcription. Two homologs of this type of TFs, AUXIN RESPONSE FACTOR 4 (ARF4) and ARF5, were also identified, both of which were upregulated in ST1.

Expression profile of homeostasis-related phytohormone genes

In order to explore hormone-mediated transcriptional regulation in gene expression during embryo development, we mapped the DEGs to eight categorical hormone-related pathways including abscisic acid, auxin, brassinosteroid, cytokinin, ethylene, gibberellin, jasmonic acid, and salicylic acid. The analysis revealed that 250 genes associated with hormone biosynthesis, response, signaling, receptors, and metabolism showed significant differential expression during somatic embryo development (Online Resource_7). The main group showing significant DE is related with abscisic acid (27.6%), followed by auxin (23.2%), ethylene (20.0%), jasmonic acid and salicylic acid (7.6%), gibberellin (5.6%), brassinosteroid (5.2%), and cytokinin (3.2%). In terms of biological functions, we found relevant Arabidopsis thaliana homolog genes within the ABA group, namely the PHOSPHOLIPASE D ALPHA 1 (PLDALPHA1) and the NUCLEAR CAP-BINDING PROTEIN SUBUNIT 1 (ABH1), which were highly expressed in all developmental stages. Within the AUX group, we found homologs of AUXIN EFFLUX CARRIER COMPONENT 2 (PIN2), which was highly upregulated in ST1; homologs of CULLIN-1 (CUL1) and TOPLESS (TPL) were found to be upregulated in both ST2 and ST3. A homolog of S-ADENOSYLMETHIONINE DECARBOXYLASE PROENZYME 4 (SAMDC4) was also identified as upregulated in ST1 and ST3. Related with AUX biosynthesis, a homolog of INDOLE-3-PYRUVATE MONOOXYGENASE YUCCA4 (YUC4) was identified as upregulated in ST1 and the GUANINE-NUCLEOTIDE EXCHANGE FACTOR GNOM (GN) was downregulated at ST4. Also related with AUX homeostasis, we identified a putative gene encoding ESCRT-RELATED PROTEIN CHMP1B (CHMP1B), which was highly expressed in all embryo developmental stages, especially in ST4. Downregulated in ST4 was a putative coding sequence related with the BR signaling pathway, a homolog of the SOMATIC EMBRYOGENESIS RECEPTOR KINASE 1 (SERK1) gene. The approach used to identify the hormone-mediated transcriptional regulated genes, also allowed the identification of an ET and JA responsive gene highly expressed in all developmental stages, especially in ST3, the CELLULOSE SYNTHASE A CATALYTIC SUBUNIT 3 (CESA3) homolog. A group of hormone-related genes not expressed in at least one developmental stage were also identified, namely THERMOSPERMINE SYNTHASE ACAULIS5 (ACL5) XYLOGLUCAN ENDOTRANSGLUCOSYLASE/HYDROLASE PROTEIN 24 (XTH24) and three HISTIDINE KINASE isoforms (HK1, HK3, and HK4).

Expression profile of genes involved in germination

Plant hormones can affect several biological processes including seed dormancy and germination (Graeber et al. 2012). Although seeds do not occur in the in vitro somatic embryo culture system used in this study, genes involved in seed germination may have analog function in the embryo maturation process. In fact, genes described as involved in seed germination and related with hormone homeostasis were found to be differentially expressed, namely transcripts with homology for ABA- and GA-related genes, which are the major class of hormones responsible for seed germination (Miransari and Smith 2014). Within these, we highlight ABA-related genes like homologs of the ABC TRANSPORTER C FAMILY MEMBER5 (ABCC5), ARM REPEAT PROTEIN INTERACTING WITH ABF2 (ARIA), and BETA-D-XYLOSE 6 (BXL6) which were upregulated in ST2 and ST3. Related with CK homeostasis, two HISTIDINE KINASE (HK) were identified, an HK3 homolog upregulated in ST4 and an HK4 homolog downregulated in ST2. Related with GA signaling pathway, one homolog of the DELLA PROTEIN (RGL1) upregulated in ST3 was found. Related with JA biosynthesis, a homolog of 3-KETOACYL-COA THIOLASE 2 (PED1) which showed an increasing expression profile through ST1 to ST4 and a homolog of the TWO PORE CALCIUM CHANNEL PROTEIN 1 (TPC1) downregulated at ST4 were reported.

Expression profiles of other embryogenesis-related genes

Embryogenesis-related genes may be involved in a broad range of biological processes and molecular functions. Besides TFs, TRs, CRs, and phytohormones that regulate diverse developmental changes in embryogenesis, which were categorized above, it is reasonable to consider that there are more genes involved in embryo development than genes which occur in the described categories. In order to overcome this, we manually searched the DE gene dataset for blast descriptions and GO terms related with embryo and embryogenesis. Furthermore, we excluded genes previously described in this study such as TFs, TRs, CRs, and homeostasis-related phytohormone genes. A list of 67 genes was compiled and a GO enrichment map, using the DEGs as reference set, was generated (Fig. 8). Gene ontologies related with embryo development and post-embryonic development were the most significantly enriched categories in this subset of genes. Besides of the relevant role in embryogenesis, these genes are also involved in anatomical structure morphogenesis, response to reproduction and flower development, abiotic stimulus and cellular processes (cell cycle, cellular component organization, and cell differentiation). A list of these genes, as well as their expression levels, is summarized in Online Resource_8. The most relevant gene groups that are important to highlight contain embryo defective, embryo abundant, and late embryogenesis abundant protein (LEA) homologs with different expression profiles during the four developmental stages and three pentatricopeptide repeat protein (PPR) homologs downregulated in ST2, namely the PENTATRICOPEPTIDE REPEAT-CONTAINING PROTEIN At3g06430, PENTATRICOPEPTIDE REPEAT-CONTAINING PROTEIN At3g49240, and PENTATRICOPEPTIDE REPEAT-CONTAINING PROTEIN At3g53700. A homolog of SUMO-ACTIVATING ENZYME SUBUNIT 2 (SAE2) and a homolog of SERINE/THREONINE-PROTEIN KINASE-LIKE PROTEIN ACR4 (ACR4) which were both upregulated in ST1 and a homolog of the ABC TRANSPORTER I FAMILY MEMBER 6 (ABCI6) upregulated in ST4 were also identified. Interestingly, a homolog of TRANSLATIONALLY CONTROLLED TUMOR PROTEIN 1 (TCPC1) with high expression values during all developmental stages, particularly in ST1, was identified.

Fig. 8
figure 8

Biological process network of GO term enrichment of DE genes related with embryogenesis GO terms not included in the transcription factors and hormone-related genes sets. The node size represents the number of genes associated to a given GO term and node fill color reflects the adjusted P value

Discussion

Quercus suber somatic embryos are an important biological resource for future breeding programs in this species. We provide the first overview of the transcriptome dynamics of somatic embryo development in Q. suber which will increase our understanding of the complex mechanism underlying the process. A total of 66,693 genes were identified with distinct roles in various biological processes, where 11,507 were differentially expressed between embryo stages. Several genes grouped in different expression clusters (Fig. 5) associated with biological processes namely generation of precursor metabolites and energy, secondary metabolic processes, cell cycle, cellular component organization, embryo development, and response to external stimulus (Fig. 6). We centered the analyses on transcripts with potential functions in embryogenesis, namely TF, TR, and CR, transcripts related with phytohormone homeostasis and germination (Table 2).

Table 2 Expression values in counts per million (CPM) of relevant genes identified in cork oak somatic embryo development

Transcription factors, transcription regulators, and chromatin regulators involved in cork oak somatic embryogenesis and somatic embryo development

Recent genetic studies have identified genes involved in embryogenesis initiation and progression as well as in embryo development (Tzafrir et al. 2004; Abid et al. 2010; Radoeva and Weijers 2014). Many of these genes encode TFs able to control various plant biological processes regulating gene expression such as embryonic and post-embryonic development or embryo germination (Radoeva and Weijers 2014). In this study, 1159 differentially expressed TFs with gene ontologies deeply related with response to stimulus, signal transduction, and structure development were identified (Fig. 7). Particularly, we found a group of AP2-EREBP TFs, including AIL5, AIL6, PLT2, and ANT differentially expressed across the four stages of cork somatic embryo development. AP2-EREBP transcription factor family play an essential role in cell proliferation and embryogenesis (El Ouakfaoui et al. 2010). BABY BOOM, a member of the family, is preferentially expressed in developing embryos, and when overexpressed, it induces the formation of somatic embryos from leaves, cotyledons, and shoot apical meristem (Boutilier et al. 2002). A close homolog of BBM is AIL5, also known as EMBRYOMAKER (EMK) or PLT5 (Tsuwamoto et al. 2010; Prasad et al. 2011). Our data revealed an ortholog of AIL5 in cork oak. Its expression pattern shows that the gene is highly expressed in the four developmental stages, which suggests that QsAIL5 may have an active function during cork oak somatic embryogenesis and similarly to its ortholog in Arabidopsis may have an important role in embryonic identity maintenance (Tsuwamoto et al. 2010). In addition, AIL6 together with PLT1, PLT2, and BBM have redundant functions in root meristem and embryo differentiation and in combination with AIL5 control shoot phyllotaxis (Galinha et al. 2007; Prasad et al. 2011). PLT2 is also described as responsible for controlling the patterning within the root stem cell niche and is required for root stem cell activity during embryogenesis (Aida et al. 2004). Therefore, we hypothesize that QsAIL6 and QsPLT2 may play a role in cork oak hypocotyl and root development. Transcription of PLT genes is stimulated by auxin and is dependent on ARF transcription factors (Benjamins and Scheres 2008; Mahonen et al. 2014). In this context, it is particularly interesting to observe that both genes show identical decreasing expression from ST1 to ST4, which may partially explain the difficulty of cork oak somatic embryo root formation without the use of plant growth regulators. Specifically, ARF5 is critical for embryonic root formation (Schlereth et al. 2010), embryo axis formation, and vascular development (Hardtke and Berleth 1998). This study found an ortholog of ARF5, which is upregulated in ST1. The relative high expression of QsARF5, QsPLT2, QsAIL6 in ST1 suggests that these TFs may act together in early stages of cork oak somatic embryogenesis to control cell identity and embryo differentiation similar to what is observed in Glycine max (El Ouakfaoui et al. 2010). However, detection of other members of the ARF family in the same stage points to a redundant function of genes of this family. We found an ortholog of AtARF4 also upregulated in ST1. In Q. suber, QsARF4 expression is specifically associated with the initial phase of acorn development (Miguel et al. 2015). In Arabidopsis, AtARF4 seems to be redundant in the establishment of tissue patterning and floral meristem determinacy (Liu et al. 2014). This points to the importance and occurrence of tissue organization and patterning in early stages of cork oak development; however, further studies have been performed in order to understand the function of the ARF genes in cork oak embryogenesis. CYTOKININ RESPONSE FACTORs (CRFs) also belong to the AP2-EREBP TF family. Analysis of loss-of-function mutations revealed that CRFs function redundantly in order to regulate embryo, cotyledon, and leaf development (Rashotte et al. 2006). Our data shows an ortholog of CRF3 with differential expression in ST1. As a transcription factor, QsCRF3 may have a key role in cork oak embryo development linking the complex transcriptional regulatory network involving TFs and phytohormones action. Within the differentially expressed TFs identified we found also a member of the GATA TF family. Its orthology with AtGATA8, which is a positive regulator of seed germination (Liu et al. 2005), lead us to hypothesize a putative role of QsGATA8 in somatic embryo germination since we observed a reduction of gene expression levels at ST3 and ST4, which may also be related with the difficulty for cork oak somatic embryos to convert into emblings, documented in several studies (García-Martin et al. 2001; Pinto et al. 2002; Hernández et al. 2003).

Homeostasis-related phytohormone genes involved in cork oak somatic embryogenesis and somatic embryo development

Hormones are involved in modulating gene expression by controlling the abundance of two types of gene regulatory proteins, TFs and transcriptional repressors integrating and coordinating plant developmental response. In this study, we found 250 DEGs associated with hormone biosynthesis, response, signaling, receptors, and metabolism. In early formed cork oak embryos, endogenous high levels of AUX associated with high methylation levels were reported after embryogenesis induction (Rodríguez-Sanz et al. 2014). We identified at least 3 AUX-related genes involved in embryogenesis and post-embryonic development: CUL1, SAMDC4, and YUC4. CUL1 targets various proteins involved in hormone response and signaling, morphogenesis, and control of circadian clock (Shen et al. 2002; Willems et al. 2004; Harmon et al. 2008; Mockaitis and Estelle 2008). Several studies showed a reduced AUX response in cul1 loss-of-function mutant development (Shen et al. 2002; Hellmann et al. 2003; Moon et al. 2007) and arrest in early embryogenesis stages in Arabidopsis (Shen et al. 2002). In this study, CUL1 is expressed in all developmental stages and may play a key role in AUX homeostasis in cork oak embryogenesis. An ortholog of SAMDC4, an important decarboxylase in plant polyamine biosynthetic pathway, was also identified. In Arabidopsis, SAMDC4 is essential for plant embryogenesis, normal growth, and development (Ge et al. 2006). In the present study, QsSAMDC4 is expressed during all developmental stages and is downregulated in ST2 and ST4. YUC4 is a AUX responsive gene and is related with hormone biosynthesis and plant development. YUC genes are mainly expressed in meristems, young primordia, vascular tissues, and reproductive organs. In Arabidopsis, Cheng and co-authors (Cheng et al. 2007) showed that AUX synthesized by YUC genes is essential for establishment of the basal body region during embryogenesis and formation of embryonic and post-embryonic organs. Our data shows the presence of an ortholog of YUC4 upregulated in ST1. This result may suggest a potential important involvement of QsYUC4 in cork oak embryogenesis initiation and maintenance. The CHMP1B gene was also identified, which is related with multi vesicular body formation and sorting of PIN AUX carriers and essential for embryo development (Spitzer et al. 2009). In Arabidopsis, CHMP1B is involved in cellular differentiation and embryo symmetry establishment (Spitzer et al. 2009). We found QsCHMP1B to be expressed in all developmental stages, with high relative expression values, especially in ST4. Therefore, QsCHMP1B may be relevant in embryo symmetry establishment and cellular differentiation in cork oak embryo development, particularly in ST4. An ortholog of SERK1, a BR receptor, was identified as downregulated in ST4. SERK1 is described as a key factor in embryogenic competence regulation, as it regulates cell death and increases stress resistance (Hu et al. 2005). Moreover, Arabidopsis seedlings overexpressing AtSERK1 exhibited an increased efficiency for initiation of somatic embryogenesis, which led to consider that an increase in AtSERK1 expression of 3-to 4-fold is sufficient to confer embryogenic competence in culture (Hecht et al. 2001). Our results show that secondary embryogenesis was observed in embryo clusters in ST1 to ST3, but this capacity was lost in ST4, where embryos follow a maturation pathway expressing QsSERK1 from ST1 to ST3 with similar mRNA expression levels, however in ST4 QsSERK1 decreases about 2.5-to 3-fold resembling that QsSERK1 may have the same ability as its ortholog in Arabidopsis.

Our data also revealed several differential expressed genes not expressed in at least one developmental stage. ACL5 is not expressed in ST1 but its expression increases from ST2 to ST4. In Arabidopsis, ACL5 is required for xylem specification by regulating and preventing premature death of xylem vessel elements (Hanzawa et al. 2000; Kakehi et al. 2008; Muñiz et al. 2008). Authors demonstrated that thermospermine produced by ACL5 is one of the factors downstream to AUX synthesis contributing to the regulation of vascular differentiation (Baima et al. 2014). Recently, Milhinhos and co-workers (Milhinhos et al. 2013) described a feedback regulatory path in poplar secondary xylem in which thermospermine levels are controlled by an AUX-dependent feedback loop mechanism involving the poplar ortholog of Arabidopsis thaliana HOMEOBOX 8 (ATHB8). Specifically, ATHB8 is expressed in procambial cells and is directly regulated by ARF5 (Baima et al. 2014). Our data reveals the presence of two QsATHB8 orthologs, expressed in all developmental stages and with similar expression patterns. Our report describes the presence of well-known key players in AUX-mediated process including hypocotyl elongation and differentiation of xylem precursor cells. Taking into account the recalcitrance of cork oak somatic embryos to elongate and form leaves in vitro, identification of these players may constitute important findings to design modulation strategies for cork oak plantlets conversion.

Genes involved in cork oak somatic embryo germination

The proper maturation and germination of somatic embryos constitutes one of the most important factors in conversion into plants (Vieitez et al. 2012). Pérez et al. 2015 reported a decrease in ABA content and in 5-methyl-deoxycytidine during cork oak somatic embryo germination. Although epigenetic control and dynamic ABA levels appear to play an important role for the correct maturation and subsequent germination of embryos (Pérez et al. 2015), identification of other elements is needed to fully understand the process.

The current study also identifies several expressed genes related with germination, namely ABCC5 homologs, upregulated in stage ST2 and ST3. ABCC5 is involved in primary root elongation and lateral root formation and also plays a role in ABA-mediated germination inhibition (Gaedeke et al. 2001; Martinoia et al. 2002; Klein et al. 2003; Lee et al. 2004). These transcripts may have important roles in cork oak somatic embryos germination, opening the gate for future studies about this topic. In addition, results revealed the presence of ARIA orthologs, which acts as positive regulator of ABA response (Kim et al. 2004). In our data, the putative QsARIA is downregulated in ST4 which may reflect a decreasing effect of ABA germination inhibition, however the gene is still expressed which may be insufficient for embryo germination to take place in the studied embryogenic cell line. Also involved in regulation of germination is QsTPC1 downregulated in ST4. TPC1 is reported to inhibit ABA-dependent germination (Peiter et al. 2005) and to be involved in JA homeostasis (Bonaventure et al. 2007). Taking together the expression patterns of ARIA and TPC1, the results led us to hypothesis the presence of a genetic driving force inhibiting cork oak embryo germination, specifically at ST4. In contrast to ABA regulation of germination, GA promotes seed germination in many plant species (Seo et al. 2009). The regulation of GA homeostasis is in part related with DELLA protein action (Cao et al. 2005). In our data, we found an ortholog of RGL1, a member of the DELLA family, which is a well-known negative regulator of GA response (Wen and Chang 2002), upregulated in ST3 and ST4. RGL1 function is related with negative regulation of the GA signaling pathway, therefore the high expression levels observed particularly in ST3 and ST4 may explain in part the low cork oak embryo germination rates by impairing GA action in germination. Another plant hormone which plays positive and negative regulatory roles in many aspects of plant growth and development is cytokinin. Riefler and co-workers demonstrate that different receptors were able to mediate CK control of seed germination, particularly AtHK4 and AtHK3, showing that CK is a negative regulator of seed germination, and distinct pathways are being controlled by different CK-hormone receptors (Riefler et al. 2006). Expression levels of HK3 and HK4 Q. suber orthologs are opposed along somatic embryogenesis development, meaning when expression of one gene decreases the expression of the other increases. This evidence may be related with redundant function of histidine kinase receptors. These findings may reveal players acting in cork oak germination inhibition however further studies are required to clarify the effect of the genes in cork oak somatic embryo conversion. It will be interesting to study how the genes respond to different hormone treatments, namely to ABA, GA and CK and how the repression and overexpression of these genes will affect conversion. These findings may have a deep impact on future cork oak reforestation and breeding programs since the low percentage of somatic embryo germination is a crucial factor in the process.

Other relevant genes in cork oak somatic embryogenesis

Somatic embryogenesis in marked by high mitotic rate and accompanied by morphogenetic events affected by many factors, such as stress signals and primary compound synthesis. A proteomic approach of Q. suber somatic embryogenesis already revealed important players in the process namely proteins associated with glycolysis, hormone biosynthesis, compound storage, cell division, reactive oxygen species detoxification, and stress response (Gomez-Garay et al. 2013). An important group of proteins involved in response to biotic and abiotic stress is the LEA protein family (Hincha and Thalhammer 2012). We identified several LEA transcripts, namely hydroxyproline-rich glycoproteins and three PPR proteins differential expressed and closely related to embryo development aspects. The first PPR gene identified was an ortholog of the PPR At3g06430. PPR At3g06430 mutants are related with delayed embryogenesis and germination abnormalities (Lu et al. 2011). The second PPR gene identified is an ortholog of PPR At3g49240 also known as EMB1796. The emb1796 mutants resulted in developmental failure beyond globular stage, exhibiting consistent and severe developmental arrest (Cushing et al. 2005). Finally, the third PPR gene identified is an ortholog of PPR At3g53700 which is involved in protein degradation, cell death, signal transduction, and transcriptional regulation required for early embryogenesis (Pagnussat et al. 2005). Furthermore, we identified orthologs of ABCI6 and SAE2, other two genes essential for embryogenesis. AtABCI6 plays an important role in plastid Fe-S cluster maintenance and repair during embryogenesis and as AtSAE2 their mutants are embryonic lethal (Xu and Møller 2004; Saracco et al. 2007). The presence of these genes in the data may constitute evidence for a group of essential genes for cork oak embryo development. In fact, the GO term “embryo development” is enriched in the “other relevant genes in cork oak somatic embryogenesis” subset (Fig. 8). Plant development is a process involving precise coordination and regulation of cell proliferation and differentiation. TCPC1 is a positive regulator of mitotic growth controlling the duration of the cell cycle both in animals and in plants and therefore acting in a complex but poorly understood regulation network (Brioudes et al. 2010). ACR4 is another important player in cell division, particularly in meristems, root tips, and lateral initiation zones of the pericycle (Watanabe et al. 2004; De Smet et al. 2008; Stahl et al. 2009). In addition, it is required during embryogenesis and embryo development (Tanaka et al. 2002). The expression of QsTCPC1 increases from ST1 to ST4, probably promoting cell division and embryo growth observed through developmental stages. Also, ACR4 is upregulated in ST1 and ST3 which may indicate a high meristematic activity at these stages of development. Identification of these genes represents an important advance in the knowledge of cork oak somatic embryogenesis. Nevertheless, functional studies will be needed in the near future in order to determine the exact role of each referred gene.

This work allowed for the characterization of cork oak somatic embryogenesis and for the identification of several genes in four distinct stages of embryo development, from a unique tree genotype. Genes related with transcriptional regulation were specially addressed in this study since they were the main class of regulators identified throughout the embryo developmental process and were involved in over-represented specific processes such as response to cell differentiation, endogenous stimulus, post-embryonic development, and signal transduction (Figs. 6, 7, and 8). Hormone-related genes were also revealed by the presence of genes involved in root formation and development as well as genes involved in germination which showed active expression dynamics in cork oak somatic embryogenesis. Functional studies of the genes identified in this work will be of great importance to better understand their role in somatic embryo development. However, it is fundamental to take into account that this study used a single tree genotype and that the results may not reflect what happens in other genotypes. However, the knowledge gathered in this study and in upcoming functional studies will be valuable for the design of successful regeneration protocols that can be integrated into future breeding programs. Additionally, the dataset generated in this work contributes to the enlargement of cork oak genomics resources.