1 Introduction

DNA binding with One Finger (Dof) transcription factors is crucial in plant growth and development (Gupta et al. 2015; Malviya et al. 2015). This plant-specific transcription factors gene family encodes proteins with a highly conserved domain of 50–52 amino acids and a C2C2-type typical zinc finger motif, a DNA binding motif, at the N-terminus (Song et al. 2016; Zou et al. 2013). Dof transcription factors play various roles in processes that are unique in plants including nitrogen assimilation (Wang et al. 2013a; Yanagisawa et al. 2004), accumulation of seed storage proteins (Dong et al. 2007), carbon metabolism (Gupta et al. 2015), association with the intracellular trafficking of proteins (Chen et al. 2013), endosperm-specific responses (Diaz et al. 2005), defence responses (Takano et al. 2013), seed germination (Noguero et al. 2013), tolerance of drought and salt (Ma et al. 2015), photoperiodic control of flowering (Fornara et al. 2009), regulation of the formation of branches, shoots and seed coats (Zou et al. 2013), and regulation of genes associated with stomatal functioning and morphogenesis (Negi et al. 2013), as well as being related to the circadian cycle (Yang et al. 2011).

The existence of huge variation of Dof genes in terms of number in various crops indicates a great likelihood of diversification in functions. Genome-wide analysis on the relative phylogeny of Arabidopsis and rice Dof gene families showed 36 and 30 Dof genes respectively (Lijavetzky et al. 2003). Likewise, attempts have been made to study the evolutionary characteristics of Dof gene families by differentiating 34, 36 and 41 Dof genes of tomato, Arabidopsis and poplar, respectively (Cai et al. 2013; Yang and Tuskan 2006). The number of potential Dof genes in Solanum tuberosum (Venkatesh and Park 2015), Hordeum vulgare, Capsicum annuum (Kang et al. 2016; Wu et al. 2016b), Chrysanthemum morifolium (Song et al. 2016), and Cucurbita sp. is 35, 24, 34, 33, 20, and 36, respectively (Hernando-Amado et al. 2012; Mena et al. 2002; Moreno-Risueno et al. 2007).

Olea europaea var sylvestris is a wild form of the olive tree and is commonly named Oleaster. Mainly found in the Mediterranean Basin, it is considered to be one of the oldest trees worldwide. Previous studies have provided evidence about cultivated olive trees, i.e., Olea europaea L. var. europaea, being more similar to oleaster species and, hence, support the idea that oleasters have an ancestral relationship with cultivated olive trees (Kassa et al. 2019; Kyriakopoulou and Kalogianni 2020). It is a small, evergreen tree and a diploid species (2n = 2x = 46) with an estimated genome size of 3.19 ± 0.047 pg/2C DNA (https://plants.ensembl.org/Olea_europaea_sylvestris/Info/Index). Wild olive is resistant to certain diseases, environmental and climatic conditions (Beghe et al. 2017). The beneficial properties of both wild and cultivated olive oil for human health have led both to have high economic and nutritional value, but these properties have made olive oil one of the agricultural products most susceptible to counterfeit and fraud. Wild olive has a higher level of antioxidant activity and phenolic content, as well as tocopherolic and orthodiphenolic contents that are either equal to or higher to those in extra virgin cultivated olive oil (Bouarroudj et al. 2016). Due to high antioxidant activity, phenolic extracts from leaves of wild olive have been investigated for use in foodstuffs, food additives and functional food materials (Lafka et al. 2013; Mohamed et al. 2007). Cosmetics and pharmaceutical industries have been using wild olive for its valuable characteristics to manufacture products. Research has shown that wild olive has antimicrobial activity against certain bacterial pathogens that target humans (Paudel et al. 2011). Wild olive and its cultivated form have been used to produce several food supplements (Colombo 2016). The complete genome of Olea europaea var. sylvestris has been sequenced (Unver et al. 2017).

The Mediterranean olive tree (O. europaea subsp. europaea) is one of the first domesticated trees and a major agricultural crop of high importance in the Mediterranean region because it is the source of olive oil (Baldoni et al. 2006; Breton et al. 2009; Diez et al. 2015; Lumaret and Ouazzani 2001). Despite the rising importance of olive as an economic and nutritious oil fruit crop, no significant research about its Dof transcription factors has been reported. The main objective of this study was to identify and characterize the genes belonging to the Dof transcription factor family in the wild olive genome (Cruz et al. 2016) using various bioinformatics tools. Briefly, a systematic approach was followed to identify Dof genes from the wild olive genome. Their chromosomal distribution, intron/exon distribution pattern, presence of conserved domains and cis-regulatory elements were also investigated. The comparative phylogenetic analysis of Dof genes from wild olive and A. thaliana was also carried out to determine the orthologous relationships and to discern their probable functions. Our extensive genome-wide evaluation of Dof gene family members in wild olive provides a reference and an opportunity for functional analysis and cloning of the members of this gene family in other olive subspecies.

2 Materials and methods

2.1 Database search and retrieval of sequence

The amino acid sequence of the Dof DNA-binding domain was retrieved from Pfam i.e., PF02701. The 59 aa sequence Dof domain from A. thaliana (Accession no NP_175581) (http://pfam.xfam.org/) (Finn et al. 2014). was used for the identification of Dof protein-encoding genes in the wild olive proteome database at Phytozome v12 (https://phytozome.jgi.doe.gov/pz/portal.html/; https://phytozome-next.jgi.doe.gov/info/Oeuropaea_v1_0/) using BLAST-P (Protein-basic local alignment search tool) (Goodstein et al. 2014). The retrieved amino acid sequences were analyzed using the simple modular architecture research tool (SMART available at http://smart.embl-heidelberg.de/) (Letunic and Bork 2018), and the NCBI CDD (Conserved Domain Database) (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi/) (Lu et al. 2020) with the default parameters. Any predicted proteins lacking the Dof conserved domain (PF02701) (https://pfam.xfam.org/family/PF02701/) were excluded.

2.2 Determination of physio-chemical properties of olive Dof proteins

The protein length (amino acid residues), molecular weight, and theoretical pI of OeuDof proteins were predicted using the ProtParam tool (http://web.expasy.org/protparam/) (Gasteiger et al. 2005). The information for gene IDs, chromosomal positions, and sequences of genes and proteins were retrieved from Phytozome. These OeuDof genes were renamed according to the order of their physical positions. The nuclear localization signals in olive Dof proteins were predicted through an online server NLSdb (https://rostlab.org/services/nlsdb/) (Cokol et al. 2000). Subcellular localization of OeuDofs was predicted using the online tool WoLF PSORT (https://wolfpsort.hgc.jp/) (Horton et al. 2006).

2.3 Gene structure analysis

To investigate the intron/exon arrangement of OeuDofs, the genomic and coding sequences of identified genes were retrieved from the Phytozome database. Moreover, the gff3 file of the olive genome was also retrieved from Phytozome v12. These sequences were further used to draw the gene structure using Gene Structure Display Server (GSDS v2.0) (Hu et al. 2015) (available at http://gsds.cbi.pku.edu.cn/).

2.4 Multiple sequence alignment and phylogenetic analysis

The amino acid sequences of Dof proteins were aligned using Clustal W version 2.1 (Thompson et al. 2003, 1994), and the phylogeny was created through MEGA X v2.0 (Kumar et al. 2018) with neighbour-joining (NJ) and bootstrapping set at 1000 replications with partial deletion. In all, 51 olive Dof and 35 Arabidopsis Dof protein sequences were used for phylogenetic analysis.

2.5 Cis-regulatory elements and conserved motif recognition

For the analysis of promoter regions, a sequence of 1000-bp upstream was retrieved from the initiation codon for each putative OeuDof gene. PlantCare database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Rombauts et al. 1999) was then used to predict cis-regulatory elements in these sequences and validated in the PLACE databases (http://www.dna.affrc.go.jp/PLACE/) (Higo et al. 1998, 1999).

Multiple EM for Motif Elicitation (MEME) (http://meme.nbcr.net/meme/) (Bailey et al. 2015) was used to analyze motifs using the predicted protein sequences of the OeuDofs with the maximum number of motifs set as 20. The minimum and maximum width of motifs were set to 6 and 50, respectively, as default values along with other factors.

2.6 Gene duplication and synteny analysis

The time of divergence of the olive Dof gene family was estimated using Ks and Ka values. Protein sequence alignments were made using Clustal W and then the Ka and Ks substitution rates were determined using the Nei-Gojobori model through MegaX Software. The rate variation among sites was modelled with a gamma distribution (shape parameter = 1). The Ka/Ks ratios were calculated. The parameters were configured as described in the software package manuals. The Ka/Ks ratios were estimated to predict the rates of molecular evolution for each paralogous gene pair. The time of divergence (T) was estimated by T = Ks/2λ, where λ represents the value of 1.8 × 10−9 (Bettaieb and Bouktila 2020).

The Multiple Collinearity Scan toolkit (MCScanX) was adopted to analyze the gene duplication events, with the default parameters (Wang et al. 2013b). To exhibit the synteny relationship of the paralogous Dof genes obtained from the olive, the syntenic analysis map was constructed using the Micro Synteny view software in TBtools (Chen et al. 2020).

2.7 Transcriptome analysis

To analyze the organ-specific expression profile of OeuDof at various development stages, we obtained previously generated RNA-seq data for olive plant tissues including fruit, flower, leaf, meristem, root and stem of mature olive trees under field conditions (Ramirez-Tejero et al. 2020). For expression profiling, Reads Per Kilobases per Million mapped reads (RPKM) values from RNA-seq data were log2 transformed. Expression patterns with hierarchical clustering are displayed in Heatmap illustrator in TBtools (Chen et al. 2020).

2.8 Putative microRNA target site analysis

The micro-RNA (miRNA) datasets of the olive tree were retrieved from Gene Expression Omnibus (GEO) NCBI (Gardiner 2010; Iwamoto 2016; Konishi 2007; Miyashima 2019) in an experiment related to drupe ripening. To identify miRNAs that could target the wild olive OeuDof genes, the CDS sequences of all wild OeuDof genes were searched for sequences complementary to miRNAs, using psRNATarget (https://plantgrn.noble.org/psRNATarget/analysis?function=3/) (Samad 2017) with default parameters.

3 Results

3.1 Identification of Dof genes in olive

To identify the Dof genes, the sequence of the Dof domain was BLAST searched against the whole genome sequence of a wild olive that was retrieved from the Phytozome database. An initial analysis led to the identification of 53 proteins. The proteins encoded by the same gene isoforms, as well as proteins containing a truncated Dof DNA-binding domain were excluded from the analysis. A total of 51 non-redundant OeuDof genes were identified and used for further analysis. These non-redundant Dof protein sequences from wild olive included the highly conserved four cysteine residues that coordinate with zinc ion and is a typical feature of Dof proteins. Within the highly conserved sequences of olive Dof domain, 24 out of 50 amino acids were found to be 100% conserved in all the Dof domain sequences (Fig. 1). Other conserved residues observed were Cys1, Pro2, Arg3, Cys4, Ser6, Lys10, Phe11, Cys12, Tyr13, Asn15, Asn16, Tyr17, Gln21, Pro22, Arg23, Cys26, Cys29, Arg31, Tryp33, Thr34, Gly36, Gly37, Arg40, Gly45 (Fig. 1) while the other 24 or 26 amino acids were discovered to be variable in all the OeuDof proteins. The OeuDof genes encode proteins ranging from 162 to 574 amino acids in length and with a molecular weight that ranges from 16.2 to 63.3 kDa, with OeuDof 41 being the smallest and OeuDof 23 being the longest protein (Table 1). The isoelectric points of identified proteins ranged from 4.73 to 10.00. A total of 9 OeuDofs namely OeuDof 7, OeuDof 41, OeuDof 43, OeuDof 44, OeuDof 45, OeuDof 46, OeuDof 48, OeuDof 49 and OeuDof 51 showed nuclear localization signals (NLSs) as predicted using NLSdb (https://rostlab.org/services/nlsdb/) (Cokol et al. 2000). The NLS signal for OeuDof 41, OeuDof 43, OeuDof 44, OeuDof 45, OeuDof 46, OeuDof 48, and OeuDof 49 were found to be GAGRRK while OeuDof 7 and OeuDof 51 showed PKKGRK and TLASMR, respectively (Table S2).

Fig. 1
figure 1

Sequence logos based on alignments of all olive Dof domains. Dof domains are highly conserved across all 51 Dof proteins in olive. Multiple alignment analysis of 51 typical olive Dof domains was performed with ClustalW. The bit score indicates the information content for each position in the sequence. (Cys) in the Dof domain are conserved and are present at positions 1, 4, 12, 26, and 29. The zinc finger motif is also indicated as the green line

Table 1 Information about 51 non-redundant Dof genes discovered from the genome of olive

3.2 Gene structures and recognition of conserved motifs and domains

The organization of exons and introns provides the backbone of genes and helps in assisting verification for the study of evolutionary relationships between genes or organisms (Koralewski and Krutovsky 2011). Their numbers and distribution patterns are an evolutionary mark for a gene family. A comprehensive demonstration of the exon–intron structures of olive Dof genes along with phylogeny revealed that the gene structure pattern was consistent with the phylogenetic analysis. The number of introns varied from one to seven in olive (Fig. 2; Table S5). Twenty-six OeuDof genes are without an intron (50.9%), twenty OeuDof genes have one intron (39.2%), four OeuDof genes have two introns (7.8%), one (OeuDof33) gene contains 3 introns and one (OeuDof 23) gene contains 7 introns (Table S5; Fig. 2). All of the OeuDof genes in subfamily D2 possessed no introns, while the number of introns in the OeuDof gene subfamily C2.1 varied from zero to three (Table S5). Similar to the Dof genes studied in various species, some Dof genes in olive possess no intron while other Dof genes possess multiple introns, up to seven (Table S5; Fig. 2).

Fig. 2
figure 2

Phylogenetic relationships and gene structures of Dof genes from the olive. a The phylogenetic tree was constructed based on the full-length sequences of olive Dof. b Exon–intron structures of the olive Dof genes. Blue boxes indicate exons; green lines indicate introns

The identification and distributions of 20 motifs within all the wild olive Dof proteins were studied using the MEME program (Fig. 3). The presence of the Dof domain was consistent among all the OeuDof proteins. It was observed that the Dof genes present in the same group encode motifs that are alike, which suggests that these conserved motifs take an essential part in activities that are specific in a group or subgroup. All 51 olive Dof genes encode the same Dof domain. In OeuDof23 an additional domain Fibronectin type-III was found. This FN3 like the domain is also present at the C-terminus of cucumisin proteins, a serine protease from melon fruits (Wen et al. 2016). The distribution of similar motifs among various Dof genes suggests that such genes might have come into existence as a result of gene expansion.

Fig. 3
figure 3

Distribution of 20 motifs on 51 Dof proteins of olive. Analysis was carried out using MEME version 4.9.0 and interlinking it with the phylogenetic tree to develop a good understanding of their association. The bars represent motifs with different colour codes for different types of motifs

3.3 Comparative phylogenetic relatedness of olive Dof gene family with Arabidopsis

To investigate the evolutionary relationships between the Dof transcription factors of wild olive and Arabidopsis thaliana, a Neighbor-Joining (NJ) phylogenetic tree was constructed through MEGA X by aligning their full-length protein sequences. The results depicted that 51 OeuDof proteins were distributed among 8 subgroups named D1, B2, C3, C2.2, C1, C2.1, B1, A, and D2 (Table S1; Fig. 4). Group D1 consisted of 16 Dof proteins in which 7 are Arabidopsis Dof-like proteins, ATG69570, ATG26790, AT5G39660, AT3G47500, AT5G62430, AT1G29160 and AT2G34140, while the remaining ones are olive OeuDof 47, OeuDof 41, OeuDof 42, OeuDof 48, OeuDof 49, OeuDof 43, OeuDof 46, OeuDof 44, and OeuDof 45. Group C3 consisted of 4 Dof-like proteins of Arabidopsis AT4G21030, AT4G21040, AT4G21050 and AT4G21080, and none of the Dof-like proteins present in this clade belongs to olive. Group B2 contained 15 Dof-like proteins in which only 3 are of Arabidopsis, AT4G38000, AT5G65590, AT1G28310, while, the rest of the positions belong to Olive, OeuDof 5, OueDOF2, OeuDof 6, OeuDof 3, OeuDof 8, OeuDof 24, OeuDof 14, OeuDof 20, OeuDof 15, OeuDof 51, OeuDof 50 and OeuDof 32. Group C1 contained 8 Dof-like proteins in which 4 are of Arabidopsis AT5G62940, AT2G28510, AT3G45610 and AT5G60200 and 4 are of olive, OeuDof 13, OeuDof 23, OeuDof 16 and OeuDof 22. Group C2.1 had 15 Dof-like proteins, 5 of them are of Arabidopsis AT4G00940, AT3G61850, AT1G64620, AT2G46590 and AT4G24060, while the remaining 10 belong to olive OeuDof 26, OeuDof 37, OeuDof 30, OeuDof 36, OeuDof 29, OeuDof 31, OeuDof 33, OeuDof 34, OeuDof 28, and OeuDof27. Group B1 consisted of 11 Dof-like proteins, 5 proteins of Arabidopsis, AT1G07640, AT2G28810, AT5G02460, AT3G55370, AT2G37590 and 6 proteins of olive, OeuDof 18, OeuDof 9, OeuDof 17, OeuDof 19, OeuDof 21 and OeuDof 25. Group A consisted of 6 Dof-like proteins, AT5G60850, AT3G21270, AT1G51700, OeuDof 4, OeuDof 7 and OeuDof 1, 3 each of Arabidopsis and olive respectively. Group D2 consisted of 5 Dof-like proteins, 2 of Arabidopsis, AT3G50410, AT5G66940, and 3 of olive, OeuDof 10, OeuDof 11 and OeuDof 12. Proteins in common clade usually seem to show similarity in structure and function (Fig. 4). So, all the Dof-like proteins in similar clades may have a similar structure as well as functions. 18 amino acids present in the Dof domain sequence of olive and Arabidopsis were found to exist in the same location (Fig. S1).

Fig. 4
figure 4

Phylogenetic relationships between OeuDof and AtDof Proteins. OeuDof proteins are marked with red stars. The evolutionary history was inferred using the UPGMA method with 1000 Bootstrap. This analysis involved 86 amino acid sequences. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There was a total of 852 positions in the final dataset. Evolutionary analyses were conducted in MEGA X (P.H.A. and R.R., 1973; J., 1985; E. and L., 1965; S. et al., 2018)

3.4 Location of chromosomes and assessment of gene duplication of olive Dof genes

Distribution on chromosomes of the analyzed Olea europaea var. Sylvestris Dof genes demonstrated that OeuDof genes were present on various chromosomes. The maximum number, five, of Dof genes were located on chromosome 18. Chromosome 15 contained four Dof genes, three Dof genes were located on chromosome 6, two Dof genes were present on each of the chromosomes 2, 3, 7, 10, 11, 16, 20, and 22, and one Dof gene was reported on chromosomes 1, 4, 9, 12, 14, and 17. A total of 17 OeuDofs were identified to be present on the scaffolds that have not been assigned to any chromosome in the wild olive genome assembly by Phytozome Database v12, yet (Fig. 5a).

Fig. 5
figure 5

a Distribution of OeuDof genes on olive chromosomes. Dof genes that are present on the same location within the same chromosome are coloured differently than the rest of the present genes. Arctic blue colour represents chromosome having 1 Dof gene, pink 2, azure blue 3, dark blue 4, royal blue 5, and white colour represents chromosome with no Dof genes in them, respectively. The scale represents a 10 Mb chromosomal distance. Genes on the scaffold are mapped imaginary due to lack of full scaffold length data. b Genome-wide synteny analysis of olive Dof genes showing the dominance of segmental duplication and rare occurrence of tandem duplication

Furthermore, synteny analysis was performed for OeuDof genes to assess segmental and tandem duplication of the OeuDof gene family in chromosomal location (Fig. 5a, b). In olive Dof genes, 8 paralogous gene pairs were distributed non uniformly in the whole olive genome, which suggested that these genes might have emerged from segmental duplication, whereas 1 paralogous gene pair located together on the same scaffold might result from tandem duplication (Fig. 5a, b).

The date of duplication of genes was also estimated through MEGA-X using pairwise alignment that provided Ks and Ka values and then Ka/Ks was calculated manually (Fig. 6). Ks depicts the number of synonymous substitutions per synonymous site, whereas Ka shows the number of nonsynonymous substitutions per nonsynonymous site, and the ratio of nonsynonymous (Ka) versus synonymous (Ks) mutation was represented by Ka/Ks. This ratio ranged from 0.36 in the OeuDof 20/OeuDof 32 pair, to 0.86 in the OeuDof 30/OeuDof 36 pair. The predicted date for tandem gene duplication of the paralogous group OeuDof 20/OeuDof 32 was calculated to be 526.60 Mya while for the remaining 8 paralogous pairs the segmental duplication date was estimated in the range from 36.50 Mya for paralogous pair OeuDof 2/OeuDof 6, to 196.70 Mya for paralogous pair OeuDof 47/OeuDof 53. All 9 paralogous group pairs in wild olive had Ka/Ks ratios greater than 0.3 but less than 1, which suggests a probability of considerable functional divergence after the occurrence of duplication due to purifying selection.

Fig. 6
figure 6

Time of gene duplication estimated for different paralogous pairs of olive Dof genes based on Ks and Ka values. Analyses were conducted using the Nei-Gojobori model. Ka represents the number of nonsynonymous substitutions per nonsynonymous site and Ks is the number of synonymous substitutions per synonymous site. While Ka /Ks represents the ratio of nonsynonymous (Ka) versus synonymous (Ks) mutations

3.5 Analysis of cis-regulatory elements

The spatio-temporal transcriptomic expression of genes is affected by the presence and organization of various cis-regulatory elements at the binding site of transcription factors on the promoter region. In-silico analysis of various cis-regulatory elements can be employed to evaluate the putative functions of genes (Bulow and Hehl 2016; Jones and Vandepoele 2020). Cis-regulatory elements with annotated functions such as response to light, seed-specific, endosperm specific, hormone specific, meristem specific, and stress were observed (Fig. 7; Fig. S2). Notably, 35 out of 51 OeuDof genes contain he ARE element that is essential for anaerobic induction, 35 OeuDof genes possess Box 4 elements, fragment of a conserved DNA module that takes part in light responsiveness, 33 OeuDof genes have the ABRE element that is involved in the abscisic acid response, 25 OeuDof genes possess the TGACG element which is responsive to methyl Jasmonic acid, 17 OeuDof genes possess the TCA element which is involved in salicylic acid responsiveness, 13 OeuDof genes showed the wound-responsive WUN motif, 12 OeuDof genes showed TC-rich repeats that show responses in defense and stress, 11 OeuDof genes possess the CAT-box that is related to meristem expression, and the MBS element that is related to drought-inducibility, 9 OeuDof genes possess the LTR element that is involved in low-temperature responsiveness, 8 OeuDof genes possess the auxin responsive TGA element, 6 OeuDof genes showed the GC-motif that is involved in anoxic inducibility, 5 OeuDof genes possess RY-elements that are specific to seed regulation, 4 OeuDof genes possess the GCN4_motif that is involved in endosperm expression, 3 OeuDof genes possess the Circadian element, which is involved in circadian control, and the gibberellin responsive GARE-motif. 2 OeuDof genes showed the MSA-like element involved in cell cycle regulation. However, OeuDof 40 and OeuDof 49 did not show any of the above-listed elements. The cis-regulatory elements identified among 51 Dof genes of olive along with their functional annotations are shown in Fig. 7 and Fig. S2.

Fig. 7
figure 7

Different cis-acting elements in putative OeuDof promoters. Elements are associated with abiotic stresses, hormone responses, growth and development. Colour legends indicating the number of cis-elements found in each OeuDof gene

3.6 Expression analysis of olive Dof gene in different organs

Differential expression patterns of all wild olive Dof genes in various developmental stages were also analyzed using the available RNA seq data (Ramirez-Tejero et al. 2020). In the following experiment, the data was collected from different anatomical tissues which included fruit, flower, leaf, meristem, root and stem of a mature cultivated olive tree under field conditions. So, these six plant organs were studied in two biological replicates. The expression profiles of wild OeuDof are represented in the form of a heat map (Fig. 8). The expression of only 5 OeuDof genes was observed in this experiment, which was OeuDof 1, OeuDof 3, OeuDof 4, OeuDof 11 and OeuDof 42 because only these 5 hits were found from the available RNA seq data. A simple hypothesis is that since the RNA seq data was extracted from a mature cultivated olive tree, the remaining OeuDof genes that were not expressed might have roles and functions in the initial and developmental stages rather than in the mature stage or maybe these OeuDof genes have evolved and hence lost their original structure in cultivated olive. Because of the high similarity between these wild OeuDof genes to their relative hit RNA seq data, the expression of these OeuDof genes can be deduced. Based on gene expression in various tissues, OeuDof 1 can be grouped with OeuDof 11, and similarly, OeuDof 3 can be grouped with OeuDof 42 as the level of expression and pattern of these genes is similar; however, the extent of expression of these genes in different organs was different from each other. OeuDof 1 showed maximum expression in the stem, followed by a flower which shows their involvement in stem and flower development. Moreover, it showed negligible expression in the roots. On the other hand, OeuDof 3 showed significant expression only in flower, and very slight expression in leaf and meristem. It was not expressed in fruits, stem and roots. The OeuDof 4 gene showed expression in all of the organs under study, with maximum expression in leaf, stem and meristem. OeuDof 11 was highly expressed in meristem and stem regions, and moderately expressed in roots, leaves, and flowers, and it also showed low expression in the fruit. Lastly, OeuDof 42 showed expression in all regions/organs of olive except leaf, showing its highest expression in flower and meristem regions.

Fig. 8
figure 8

Heat map of the expression profile for the olive Dof genes in different organs in a mature olive tree. The x-axis represents the names of the six organs in a mature olive tree, and the y-axis represents different OeuDof genes. The expression levels of OeuDof genes are revealed by different colours, which increase from blue to red

3.7 miRNA targeting of OeuDof genes during drupe ripening in O. europaea var. sylvestris

miRNA analysis of two cultivars of cultivated olive (Cassanese and Leucocarpa) was carried out at 100 DAF (days after flowering) and 130 DAF to analyze the miRNAs that were involved in drupe ripening. For each cultivar, samples of 30 drupes were taken for both 100 DAF and 130 DAF (Carbone et al. 2019). In the overall experiment, 19 miRNA sequences (ranging from 28 to 32 nucleotides) that were related to 10 wild olive (Olea europaea var. sylvestris) Dof genes were observed. In cultivar Cassanese, at 100 DAF, 5 miRNAs were identified targeting 4 wild olive Dof genes (OeuDof 40, OeuDof 44, and OeuDof 47) whereas, at 130 DAF, 8 miRNAs were observed targeting 7 wild olive Dof genes (OeuDof 11, OeuDof 15, OeuDof 28, OeuDof 40, OeuDof 46, OeuDof 47, and OeuDof 48). In cultivar Cassanese, at 100 DAF, wild olive Dof genes OeuDof 40, OeuDof 44, and OeuDof 47 each had 1 matched miRNA sequence, while wild olive Dof genes OeuDof 44 had 2 miRNA sequences that matched its sequence.

miRNA sequences were also observed from a second olive cultivar under study (i.e., Leucocarpa). Initially, the results of the 100 DAF sample were analyzed. In 100 DAF sample, 3 miRNA sequences were found corresponding to 3 wild olive Dof genes (one sequence per gene). The length of Leucocarpa miRNA strands consisted of 30–31 nucleotide sequences. These miRNAs showed alignment with wild olive genes OeuDof 1, OeuDof 44, and OeuDof 48 (Table S3), and lastly, from the fourth sample (which was of cultivar Leucocarpa at 130 DAF), 3 miRNA sequences were found that aligned with 3 wild olive Dof genes, OeuDof 11, OeuDof 40, and OeuDof 44. The strand length of these miRNA sequences consisted of 29–30 nucleotides.

3.8 Putative miRNA targets in Olea europaea

The miRNAs sequences were retrieved from the Plant MicroRNA Encyclopedia database. Those miRNAs that could potentially target wild olive (Olea europaea var. sylvestris) Dof genes were identified with the help of psRNATarget online tool (https://plantgrn.noble.org/psRNATarget/analysis). A total of 88 miRNAs were found that targeted 28 out of the total 51 wild OeuDof genes. The remaining 23 OeuDof genes were not targeted by any of these miRNAs (Table S4). The length of these miRNAs ranged from 20 to 22 amino acids. The number of miRNAs targeting these genes ranged from 1 to 19 miRNAs per the wild OeuDof gene. OeuDof 2, 4, 7, 15, 19, 22, 27, 35, 37, 39, 43, and 46 are the genes that were targeted by only 1 mature miRNA. On the other hand, OeuDof 16, 34 and 47 were targeted by 2 mature miRNA each. OeuDof 9, 26, 32, 33, 41, and 49 all were targeted by 3 miRNAs. Furthermore, OeuDof 20 was targeted by 4 miRNAs, OeuDof 23 by 5 miRNAs, OeuDof 44 by 7 miRNAs, OeuDof45, and Dof48 were each targeted by 8 miRNAs. Lastly, OeuDof40 was targeted by 19 miRNAs (Table S4). Thus, OeuDof40 was the only gene that was targeted by the maximum number of miRNAs. In terms of the groups, Group D1 was targeted the most, as it was targeted by 33 mature miRNAs. On the other hand, Group A was targeted by only 3 miRNAs, which is the least among all groups. Group B1, B2, C1, C2.1 and, C2.2 were targeted by 4, 8, 8, 10, and 21 miRNAs, respectively (Table S4).

4 Discussion

Transcription factors (TFs) are important regulatory molecules and have the main role in the regulation of gene transcription and networking. Characterization and identification of transcription factors provide a better understanding of plant growth and development under environmental stimuli (Jones and Vandepoele 2020; Wen et al. 2016; Yanagisawa and Schmidt 1999).

According to the phylogenetic and domain analysis of A. thaliana (Lijavetzky et al. 2003), citrus (Wu et al. 2016a), and eggplant (Wei et al. 2018), Dof transcription factors were divided into nine subfamilies (Group A, B1, B2, C1, C2.1, C2.2, C3, D1 and D2). In this study, we used a recently released wild olive (Olea europaea var. sylvestris) genome database (https://phytozome-next.jgi.doe.gov/info/Oeuropaea_v1_0/) to identify 51 wild olive Dof genes at the genome level (Table 1). 51 Dof genes of wild olive were classified into eight subfamilies (Group A, B1, B2, C1, C2.1, C2.2, C3, D1 and D2) using the phylogenetic analysis (Fig. 4; Table S1), and this subfamily grouping was based on the original classification of Arabidopsis Dof genes; however, Arabidopsis Dof C3 subfamily (Zou et al. 2013), AtDOF4.2 (AT4G21030), was missing in the wild olive genome. AtDOF4.2 (AT4G21030) helps in seed coat formation and regulates shoot branching in Arabidopsis (Zou et al. 2013). The number of Dof genes in wild olive was lower than in banana (74 MaDof) (Dong et al. 2016) and Chinese cabbage (76 BrATDof) (Ma et al. 2015), but greater than in rice (30 OsDof) (Yang and Tuskan 2006), Arabidopsis (36 AtDof) (Yang and Tuskan 2006), and tomato (34 SiDof) (Cai et al. 2013).

The exon–intron structure can also be used as evidence for understanding the evolutionary relationships among genes or organisms (Bondarenko and Gelfand 2016; Koralewski and Krutovsky 2011). The predicted exon–intron association revealed that a total of 26 wild olive Dof genes out of 51 were intron-less (Table S5); on the contrary, 10 wild olive Dof genes possessed only one intron that was present on the upstream end of the Dof domain, and also, another 10 wild olive Dof genes possessed only one intron that was present on the downstream end of the Dof domain. It was observed that some of the intron and non-intron containing genes were classified in the same group. In general, the wild olive OeuDof genes present in the same subfamily shared similar exon–intron structures (Fig. 2), but differences were present in different subfamilies. Similarity of exon–intron structures has also been noticed in Arabidopsis, rice and, soybean (Lijavetzky et al. 2003; Gu et al. 2013), which suggested that these structures are evolutionary preserved.

Classification of OeuDof genes was also verified by the conserved motif analysis. All of the OeuDof protein sequences were imported into the MEME analysis tool to identify the conserved motifs. As a result, a total of twenty conserved motifs were observed, which were statistically significant with E-values less than 1 × 10−40 (Fig. 3, Fig. S3, Fig. S4 and Table S6). The motifs of OeuDof proteins identified by MEME were between 15 and 50 amino acids in length. Among them, Motif-1 is a common motif in all wild olive Dof proteins, corresponding to the CX2CX21CX2C single zinc-finger structure in the Dof domain, which is the highly homologous core region of the Dof family (Fig. 1). While all of the Group D2 proteins and many of the Group B2, C2.2, and A proteins only contain Motif-1, some Dof proteins have extra specific motifs, which may be relevant to different functions. The Dof proteins from Group D1 had the most complex motif pattern, and Motif2, Motif 3, Motif 5, Motif 10, and Motif 14 specific for this group. While B1 group members have a relatively simple motif pattern as compared to Group D1, they also had group-specific motifs, such as Motif-7, and Motif-16, but not all the group members have these specific motifs. To understand the potential roles of the Group D1-specific motifs, GO annotations of the Group D1 genes in Arabidopsis were checked. Interestingly, we found that in comparison with the Arabidopsis Dof genes (Table S7) in other groups, most of the AtDof genes in Group D1 have flower development-related annotations, such as “flower development”, “flowering”, “negative regulation of short-day photoperiodism”, “negative regulation of long-day photoperiodism”, “regulation of timing of the transition from vegetative to reproductive phase”, and “vegetative to the reproductive phase transition of meristem”, which implied the possible functional divergence of the Dof genes in group D1 (Table S7).

The distribution of motifs among the wild olive Dof proteins (Fig. 3) is indicative of its evolutionary relationship as deduced by the phylogenetic tree (Gupta et al. 2015; Malviya et al. 2015). The motif data analysis by MEME (Fig. 3), and domain analysis using NCBI CDD (Fig. 3) and the alignment of the wild olive OeuDof protein sequences (Fig. 1) revealed a highly conserved Dof domain, which was observed at the N-terminal region of 42 OeuDof genes, and 9 OeuDof genes have the Dof domain in the central region (Figs. 1, 3). Dof transcription factors have been evolutionarily conserved among plants. Apart from the Dof domain, nineteen distinct motifs were identified that were differentially distributed among the wild olive OeuDof genes (Fig. 3). Meanwhile, at least one or two conservative motif types and spatial distributions in the wild olive OeuDof genes are present in the same subfamily while some differences were present between the member of different subfamilies, implying certain functional similarities of the wild olive Dof members within the same subfamily. In addition, the wild OeuDof genes showed structural conservation in subfamilies and were consistent with other plants such as Arabidopsis, banana, rice, and, chickpea (Dong et al. 2016; Lijavetzky et al. 2003; Nasim et al. 2016; Yang and Tuskan 2006). In addition, as predicted by in silico analyses, 9 deduced wild olive OeuDof genes harboured NLSs to guide their localization in the nucleus (Table S2), but subcellular localization analysis using online tool WoLF PSORT (https://wolfpsort.hgc.jp/), predicted nuclear localization in all OeuDof proteins, except, OeuDof 18 (Table S2).

The duplication of genes can be predicted from their locations on the chromosome. This means that two or more genes that are present on the same chromosome can be the result of tandem duplication, whereas duplicated genes that are located on different chromosomes might be the result of segmental duplication (Panchy et al. 2016). The highly dense presence of olive Dof genes on chromosomes 18 and 15 (Fig. 5a) is an indication of tandem duplication, but instead of tandem duplication, the predominance of segmental duplication was observed using the synteny analysis (Fig. 5b). The predominance of segmental duplication was also observed in the chickpea (Nasim et al. 2016), and pigeon pea (Malviya et al. 2015) Dof gene families. The phenomenon of gene duplication is the main driver of expansion in the gene family, and perhaps, the increase in the number of Dof genes in higher plants can be due to duplication of the domain during the evolution of eukaryotic plants (Taylor and Raes 2004; Moore and Purugganan 2005).

The ratio of Ka/Ks provides an understanding of the selection pressure on the substitution of amino acids (Fig. 6). A ratio of Ka/Ks < 1 suggests the possibility of purifying selection whereas Ka/Ks ratios > 1 suggest the likelihood of positive selection (Yang and Bielawski 2000; Hurst 2002). Generally, evaluation of selective pressure provides selective leads for amino acid sequences altered in a protein and are also necessary for interpreting functional residues and functional protein shifts (Morgan et al. 2010). Ka/Ks ratios of the sequences from the different olive Dof groups vary remarkably. Despite the differences, the estimated values of Ka/Ks ranged from 0.36 to 0.86, which being less than 1 suggested that the Dof sequences present in each group underwent strong purifying selection pressure and positive selection might have acted on only a few sites during the process of evolution (Fig. 6).

Cluster analysis provides important clues about the function of olive Dof genes. OeuDof genes showed specific spatial and temporal expression patterns in different organs and developmental stages. As mentioned above, data for 5 out of 51 OeuDof genes were found in the data that was downloaded from the NCBI GEO dataset experiments (Ramirez-Tejero et al. 2020). From the expression comparison graph, it can be observed that different OeuDof genes are expressed in different organs (Fig. 8). OeuDof 4 and OeuDof 1 were expressed more in flower meristem and stem as compared to the underground part which suggested their role in stem meristem and leaf development. OeuDof 4 and OeuDof 1 were found in group A that includes AtDof 1 (AT1G51700), AtDof 2 (AT3G21270) and AtOBP 4 (AT5G60850) (Peng and Weselake 2011; Ramirez-Parra et al. 2017; Rymen et al. 2017; Xu and Cai 2019; Xu et al. 2016). AtDof 1 (AT1G51700) was expressed during the early globular embryo stage, whereas AtDof 2 (AT4G38000) exhibited a similar expression profile during seed development. These genes are generally expressed all over the plant, more specifically in root, stem, leaf, flower, seed, guard cell, plant embryo, and pollen (Peng and Weselake 2011).

OeuDof 3 (a group B2 member) had shown expression in flowers and no expression in stem, root and fruit. This pattern revealed that this gene might be associated with the reproductive function of the. europaea var. sylvestris. In Arabidopsis, the members of group B2 are AT5G65590, AT4G38000 and AT1G28310 (Moreno-Risueno et al. 2007; Yanagisawa 2002). These results are consistent with their orthologue partner in Group B2 that also showed expression during early flower development in Arabidopsis (Wellmer et al. 2006), which is a key process in the life cycle of a plant during which floral patterning and the specification of floral organs is established (Wellmer et al. 2006). The cis-regulatory analysis also predicted that OeuDof3 has roles relating to light and during abiotic stress (Fig. 7). OeuDof 11 also showed a strong expression that was very similar to OeuDof 4 but had a little lower expression as compared to OeuDof 4. OeuDof 11 (Group D2) is closely related to AT5G62940, AT5G66940 and AT3G50410, which are involved in the regulation of cambium formation and vascular tissue development, particularly at a very early stage during inflorescence stem development, and promotes both cambium activity and phloem specification, but prevents xylem specification. These are also expressed in carpel, cauline leaf, collective leaf structure, flower, flower pedicel, hypocotyl, inflorescence meristem, petal, plant embryo, root, seed, sepal, shoot apex, shoot system, stamen, stem, and vasculartissue of leaf; collectively, in the whole plant (Guo 2009; Miyashima 2019; Yanagisawa 2002). In olive, the orthologs of these three Arabidopsis proteins are OeuDof 10, OeuDof 11, and OeuDof 12. So, it can be inferred that these OeuDof proteins may also have similar roles and functions in the wild olive plant as of their orthologs in Arabidopsis.

In the end, OeuDof 42 (Group D1) appeared to have slight to moderate expression in various organs and parts of the plant. The members of group D1 in Arabidopsis are AT1G29160, AT2G34140, AT3G47500 and AT5G39660, which are transcriptional repressors of CONSTANS expression and thus regulate the photoperiodic flowering response (Fornara 2009; Fornara et al. 2009; Imaizumi 2005). These proteins are orthologs of the CYCLING DOF FACTOR 1 (CDF1), which interacts with FKF1 and regulates CO expression (Imaizumi 2005).

MicroRNAs are very important regulators in plants that regulate almost every biological process ranging from growth and development to combating pathogens and maintaining proper internal conditions (Carbone et al. 2019; Samad 2017; Spanudakis 2014; Terzi 2008). miRNAs are highly conserved among different species, meaning that each microRNA performs a specific function, regardless of the type of species in which they were observed. Cyclic DOF (CDFs) play an important role in blue light signalling. AtCDF2 acts as a transcriptional activator or repressor of a group of microRNA (miRNA) genes and binds to the pri-miRNA transcripts (Sun et al. 2015). CDF2 is a suppressor of miRNA biosynthesis and acts by targeting Dicer-like 1 (DCL1) complex and suppresses the processing of primary miRNAs. CDF2 works in the same pathway as miR159 or miR172 to control flowering (Sun et al. 2015). OeuDof 42 (a putative CDF in olive and Group D1) appeared to have slight to moderate expression in various organs (flowers) and parts of the plant. SlCDF4 gene expression was detected during tomato fruit ripening, whereas SlCDF5 transcripts were abundant only in green fruit, and SlCDF2 showed similar expression in green and red fruit (Corrales et al. 2014). Five wild olive Dof genes, including OeuDof40, OeuDof43, OeuDof 44, OeuDof 47, and OeuDof 48 belong to group D1 and are CDF partners in olive. In this study of possible miRNA targets, olive CDFs (OeuDof 40, OeuDof 43, OeuDof 44, OeuDof 47, and OeuDof 48) were detected by 19 miRNAs during drupe development in olive and also from the miRNA data downloaded from different miRNA databases.

OeuDof 47 was expressed only in cultivar Cassanese in both 100 DAF and 130 DAF samples. Whereas OeuDof 48 was observed in “Cassanese 130 DAF” and “Leucocarpa 100 DAF” samples. It can also be inferred from the miRNA analysis that OeuDof 40 and OeuDof 44 showed expression in the highest number of samples. OeuDof 40 was present in all samples except for Leucocarpa at 100 DAF, and OeuDof 44 was found in all samples except for the “Cassanese 130 DAF” sample. The results clearly showed the variation of expression of the miRNAs. As both cultivars differ in the colour of epidermis and mesocarp of the fruit, and the expression of OeuDof genes also differ among the cultivars; these results imply that OeuDof genes might also play some role in the pigmentation of the olive fruit. OeuDof 48 and OeuDof 49 are also CDF partners and belong to group D1 (Fig. 4; Table 2; Table S1). From the data gathered from plant microRNA encyclopedia (http://pmiren.com/), both of these OeuDof genes were found to be targeted by miR164 at only a single site. Moreover, it can also be inferred that inhibition of expression of OeuDof 48 and OeuDof 49 can induce flowering, and additionally, can also enhance the drought resistance capability of the plant. miR166 regulates floral development by affecting the morphogenesis of flowers (Jung and Park 2007). In olive, this miR166 was observed to target two genes, OeuDof40, and OeuDof35, and it can be inferred that by inhibiting these OeuDof genes floral development can be regulated (Table 2). miR172 has been observed to be involved in multiple processes (Jung et al., 2014; Kim et al., 2014; Li et al., 2016; Sun et al., 2015; Yamashino et al., 2013; Yang et al., 2015). This miRNA has roles at the transitional stages like the transition from juvenile to the adult stage; it is also important in regulating proper shifting of the plant from vegetative to the reproductive stage during maturity. Additionally, it also regulates the proper development of the flower of the plant. In olive, this miR172 was seen to target three genes, OeuDof44, OeuDof45, and OeuDof47, targeting a specific site in each of these genes (Table 2). The miR172 acting upon OeuDof44 and OeuDof45 had two different but nearly identical nucleotide sequences. The orthologues of OeuDof44, OeuDof45, and OeuDof47 in Arabidopsis are CDFs and are involved in the suppression of CONSTANS which are required for floral initiation and development (Fornara 2009; Imaizumi 2005; Sun et al. 2015). So, these three OeuDof genes might have the opposite function of any or all of the abovementioned miR172 roles.

Table 2 Information about miRNAs, their functions and target IDs

The miR159 family is a very abundant miRNA family and represents one of the most ancient miRNAs in the plant kingdom (Wu et al. 2009), which has three members, miR159a, miR159b, and miR159c in Arabidopsis. miR159 is involved in the transition from the vegetative to reproductive phase in the plant, as well as in the regulation of flower development (Wu et al. 2009). This miRNA group was found to target 3 OeuDof genes which are OeuDof9, OeuDof20, and OeuDof32 (Table 2). In all three of these genes, miR159 was targeted at three different regions as three types of miR159 were observed targeting these Dof genes. This might suggest that OeuDof9, OeuDof20, and OeuDof32 might be involved in promoting vegetative growth within the plant and inhibiting floral and reproductive development.

5 Conclusion

In this study, we reported a comprehensive analysis of OeuDof transcription factor genes in the wild olive genome. The 51 OeuDof genes were categorized into Eight subgroups and some of the structural and functional properties of each OeuDof member were characterized. Most of the OeuDof genes were involved in flower and stem development. miRNA data on possibly targeted OeuDof genes during drupe development in olive suggested their role in fruit growth and development. The detailed computational inspection of olive Dof proteins revealed in the current study might be used for selection and cloning at the molecular level, portraying gene expression and studying their interactions with different transcription factors. The presence of similar numbers of Dof genes in some plants such as 33 in tomato, 34 in pepper, and 35 in potato and relatively more Dof genes in other plants like 78 in soybean, and 51 in olive suggests that duplications might have led to the expansion of Dof gene family in some species.