Introduction

The Arecaceae family, which includes palms, is derived from a single ancestral lineage and forms a closely related phylogenetic group (Comer et al. 2016). Cássia-Silva et al. (2020) identified notable evolutionary alterations in growth patterns and sizes within this family. Nadot et al. (2016) focused on the development of sexual systems in palms, taking advantage of recent advances in our phylogenetic understanding. Furthermore, advanced phylogenomic techniques have facilitated comprehensive investigations into the evolution of gene families in palms (Barrett et al. 2016). However, there are still gaps in our understanding of the interrelations among different palm genera and species. Barrett et al. (2019) proposed that historical polyploidy events may have affected the genomic evolution of palms, an area of research that remains incompletely understood.

Palms such as oil palm, coconut, and date palm show some differences in flower development. Palms usually produce separate male and female flowers organized in specific clusters (Castaño et al. 2014). They can exhibit either monoecious or dioecious characteristics, with male and female flowers present on the same or different plants. The oil palm is monocarpic, with a spadix inflorescence containing both male and female flowers (Adam et al. 2007). Coconut commonly exhibit both male and female flowers on the same inflorescence, a condition known as monoeciousness (Nayar 2018). However, there are dioecious variations in which separate male and female plants exist, with the wind playing a role in pollination. Date palm exhibit dioecious characteristics, with distinct male and female plants (Al-Ameri et al. 2016). The observed variations in palm flower structures and reproductive systems among these economically substantial species underscore their diversity. Preserved gene coding regions play a crucial role in evolutionary research by offering valuable information about the structure and function of genes. The conserved sequences described by Moutinho and Eyre-Walker (2022) and Shan et al. (2022) contribute to our understanding of codon organization and coevolutionary dynamics. Yıldırım and Vogl (2023) argue that these tools also aid in the identification of regulatory elements and transcription factor-binding sites, which play a crucial role in gene control and the development of phenotypic traits. Comparative genomic analyses of gene orders provide insights into evolutionary processes (Herrig et al. 2023; Xu et al. 2022), whereas similarities in gene structure aid in tracing their coevolution during speciation (Fuertes et al. 2019).

Polosoro et al. (2021) recently discovered phosphatidylethanolamine-binding proteins (PEBPs) in oil palm. These proteins are evolutionarily conserved and play important roles in diverse biological processes (Wang et al. 2019). These pathways play crucial roles in the transduction of cellular signaling networks (Hoogenboom et al. 2016; Leeggangers et al. 2018; Tribhuvan et al. 2020; Zhang et al. 2016). In addition to their signaling role, PEBPs affect various aspects of plant growth and development, including the regulation of flowering time, circadian rhythms, and seed development (Wang et al. 2016; Zhang et al. 2016).

Yang et al. (2019) classified plant PEBP into three main clades: FLOWERING LOCUS T (FT)-like, TERMINAL FLOWER 1 (TFL1)-like, and MOTHER OF FT AND TFL1 (MFT)-like. FT-like proteins, such as FT and TWIN SISTER OF FT (TSF), play a vital role in promoting flowering, as supported by research conducted by Ho and Weigel (2014) in Arabidopsis and Li et al. (2015) in cotton plants. On the other hand, TFL1-like proteins, including TFL1, BROTHER OF FT AND TFL1 (BFT), and CENTRORADIALIS (CEN), act as inhibitors of flower development, as demonstrated in apple and Arabidopsis (Haberman et al. 2016; Ahn et al. 2006; Yoo et al. 2010). Chen et al. (2018) found that MFT-like proteins in longan (Dimocarpus longan) primarily participate in the stress response and seed germination rather than flowering.

The categorization of the PEBP subfamilies aids in understanding their varied functions and regulatory roles in the processes of plant growth and flowering. Numerous investigations have been conducted on the contrasting roles of FT and TFL1 in flowering in Arabidopsis thaliana, despite their high DNA similarity (Bennett and Dixon 2021; Nakano et al. 2015; Z. Wang et al. 2017; Hanzawa et al. 2005; Ahn et al. 2006). Certain amino acids found in FT paralogs can hinder the flowering process in species, such as sugar beets (Beta vulgaris). In addition, genes resembling FT and TFL1 have been observed to control the growth and termination of meristems in perennial and cyclically growing species (Pin et al. 2010). This study seeks to enhance our understanding of the evolutionary dynamics of PEBP genes in palm species. We aimed to understand the evolutionary path of palm PEBP-like genes in important palm species, such as oil palm, coconut, and date palm, by examining publicly available genomic DNA, mRNA, and protein sequences. Our study reveals the evolutionary connections and genetic associations among PEBP genes in these species, including the identification of FT-like genes originating from at least four distinct gene groups. The presence of consistent relational patterns in specific sequences observed in oil palm, coconut, and date palm implies a potential association with the regulatory mechanisms that control their flowering processes. This study signifies a potential association with the regulatory mechanisms that control their flowering processes, which are useful for accelerated breeding programs.

Methods

Sequence retrieval

This study involved a thorough collection of PEBP protein sequences from relevant references and databases. To obtain a comprehensive set of PEBP protein sequences in the palm family, we used the sequences of Arabidopsis thaliana PEBP, AtFT1 (BA77838), AtTFL1 (At5g03840), and AtMFT (NP_173250.1) from the National Center for Biotechnology Information (NCBI) database to perform a targeted Blastp search, specifically for the Arecaceae family (taxid: 4710). Due to the scarcity of coconut mRNA data in the NCBI database, we employed an alternative methodology for analyzing several PEBP sequences from this species. They were obtained from the Coconut Genome Database (GCA_008124465) through a Blastn search, utilizing PEBP sequences from oil palm and date palm (Supplementary 1). This ensured the incorporation of the coconut data into our analysis.

Construction of phylogenetic trees and sequence alignment

The MUSCLE alignment tool was used to align protein and mRNA sequences. Phylogenetic analysis was performed using the maximum likelihood tree method, employing the Molecular Evolutionary Genetics Analysis software package. We used Interactive Tree of Life post-analysis to visually interpret the phylogenetic trees. Furthermore, we conducted a pairwise distance analysis to compile mRNA sequences using the Molecular Evolutionary Genetics Analysis software to assess the genetic distances among the PEBP genes, thereby facilitating the quantification of their genetic divergence.

Protein structure homology modeling with SWISS-MODEL

The SWISS-MODEL platform was used to analyze protein sequences and predict their functional characteristics. Our primary objective was to enhance the reliability of the model quality by utilizing Qualitative Model Energy Analysis. The SWISS-MODEL process involves identifying compatible templates, aligning target sequences with these templates, constructing a model based on the alignment, and thoroughly evaluating the quality of the model. The metrics of the SWISS-MODEL include sequence similarity data, Global Model Quality Estimate (GMQE), and QMEANDisCo. These metrics facilitated a thorough assessment of the models and provided insights into the structural and functional characteristics of the proteins under investigation.

In silico expression analysis of coconut and oil palm PEBP genes

An in silico expression study was performed to investigate whether the expression of PEBP genes was correlated within the palm family. Coconut and oil palm were selected because of the availability of transcriptome data for several organs at different stages. Transcriptomic Sequence Read Archive (SRA) data for mature leaves (ERR3588913 and SRR25119995), roots (SRR22255955 and SRR7812013), male flowers (DRR129238 and DRR053157), and female flowers (DRR045028 and DRR053155) were obtained from the NCBI database. SRA data were extracted using Fastq and quantified using Kallisto to obtain transcripts per million levels for PEBP genes in coconut and oil palm. Transcripts per million values were examined using Heatmapper (University of Alberta, Canada) to explore differential expression.

Results and discussion

Grouping of PEBP genes in the palm family

We conducted thorough BLASTp and BLASTn searches to identify 27 unique PEBP-like proteins in the three palm tree species. The comprehensive dataset is presented systematically in Table 1. To ensure the accuracy of our findings, we comprehensively examined each predicted gene. Multiple MUSCLE alignments were performed to verify the presence of a fully intact and identifiable PEBP domain. After identification and verification of the sequences, we categorized the PEBP linked to palm species into different clades and subclades. This classification is essential for understanding the evolutionary relations among proteins. Table 1 displays the unique gene names and protein accession codes assigned to each protein, which were obtained from annotations in the NCBI protein database. We used the NCBI gene ID information to identify the precise locations of these proteins, thereby offering a comprehensive framework for our research outcomes. Additional predicted proteins are annotated as gene names with asterisks. This approach improved data reliability and facilitated a deeper understanding of the genetic architecture and evolutionary dynamics of PEBP-like proteins in the palm family.

Table 1 PEBP identification in coconut, oil palm, and dates using Blastp in the NCBI database

Phylogenetic analyses

The maximum likelihood method was used to conduct phylogenetic reconstructions in our study. We aligned PEBP sequences with mRNA sequences derived from Arabidopsis and several palm species, including oil palm, date, and coconut. The phylogenetic trees in Fig. 1a, b exhibited consistent topologies, confirming the reliability of our alignment and reconstruction methods. The 27 PEBP proteins were classified into three main groups: the FT-clade, TFL-clade, and MFT-clade. In the FT clade, 14 FT-like proteins were categorized into two subclades: Subclade I, consisting of eight members, and Subclade II, consisting of six members (refer to Fig. 1a, b, and Table 1 for more details). None of the FT-like proteins in Subclade I aligned with the reference proteins AtFT and AtTSF, suggesting that these proteins have distinct sequence patterns. PEBP subclades in sugarcane, soybean, and sorghum show similar variations, with two, three, and three subclades, respectively (Książkiewicz et al. 2016; Lee et al. 2021; Venail et al. 2022). Lee et al. (2021) found that most FT proteins in various subclades play a role in promoting flowering, albeit with varying degrees of impact.

Fig. 1
figure 1

Phylogenetic analysis of the a PEBP protein family with comparative references to FT, TFL, and MFT protein sequences and b PEBP mRNA in oil palm, coconut, and dates. Both proteins and mRNA sequences clustered into three clades: FT (grey and green), TFL (red), and MFT (yellow). (Color figure online)

Proteins from various palm species tend to form distinct subclades within the TFL clade (Fig. 1). These subclades exhibited significant differences compared to their Arabidopsis counterparts, such as AtTFL, AtCEN, and AtBFT. Divergence was observed within the MFT clade, with significant differences between the palm TFL clade proteins and the Arabidopsis representative AtMFT. These findings underline the evolutionary diversity and adaptability of PEBP family members. Certain proteins display conserved characteristics among different species. Oil palm proteins exhibited greater similarity to coconut proteins compared to date palm proteins. Furthermore, our comparative analysis included mRNA sequences, as depicted in Fig. 1b, which contributed to a more comprehensive understanding of the evolutionary dynamics within the palm family. Protein and mRNA sequence analyses provided reliable results to enhance our understanding of the evolutionary history and functional differentiation of these proteins within the palm family.

Evolutionary divergence of PEBP mRNA

Figure 2 shows the pairwise distance results for the mRNA sequences of the PEBP family. Lower values indicate closer sequence proximity, and higher values indicate greater divergence. These results confirmed the phylogenetic patterns, suggesting significant sequence similarity within specific groups. In Fig. 2, box a, subclade I of FT-like exhibited a high degree of sequence homogeneity among its members. The coconut sequences exhibited an intermediate evolutionary position relative to the oil palm and date palm sequences, as indicated by their lower divergence values for both species. This suggests a closer evolutionary relation between coconut and these two palms. In contrast, the divergence between oil palm and date palm sequences was considerably greater, suggesting a substantial evolutionary gap. Based on these findings, the PEBP family was classified into six groups: FT-like subclade I, one group from FT-like subclade II, one group from MFT-like, and two groups from the TFL clade. This categorization is depicted in Fig. 2a–f. This pattern of sequence proximity extends beyond comparisons between different species to encompass duplications within the same species. Furthermore, we identified genes specific to certain species and observed significant variations in their functions. One example is the oil palm gene XP_010912140, which exhibits a DNA sequence alteration resulting in a modified protein function, despite the presence of similar variants in other species. The details of these findings are presented in Supplementary Table2.

Fig. 2
figure 2

The depiction of the evolutionary divergence of PEBP mRNA sequences from oil palm, coconut, and dates is presented as a distance matrix. Box a-f are groups of mRNA with closely similar sequences, which suggest they originated from the same ancestral genes

Characterization of PEBP genes in the palm family

Oil palm, coconut, and date palm, which belong to the Arecaceae family, exhibit genetic similarities due to their shared botanical ancestry. Khan et al. (2018) found evidence supporting a closer genetic relation between oil palm and coconut compared to oil palm and date palm. Genetic proximity between PEBP-coding genes indicated that oil palm contains a higher copy number of all PEBP protein members. This study has several crucial implications. Our research supports these findings, indicating significant genetic similarities between oil palm and coconut. However, the PEBP genes in these species showed less resemblance to those found in date palm. This disparity underlines the variation in genetic evolution within the Arecaceae family. The Arecaceae family, which includes oil palm, coconut, and date palm, exhibited gene duplication (Barrett et al. 2019). This tendency may be crucial for understanding the sources of comparable PEBP gene sequences in these plants. Gene duplication in the Arecaceae family indicates the intricate genetic mechanisms that contribute to the diversity of PEBP sequences in different palm species, highlighting the intricate nature of genetic evolution and adaptation in this family.

The similarity in the PEBP protein and mRNA sequences between coconut and oil palm, as opposed to date palm, indicates a close evolutionary relation. Hypothetically, the evolutionary relation among these three species may be related to the changes observed in the transition of flower structure throughout their evolution. Coconut plants generate hermaphrodite flowers, whereas oil palm produce separate male and female flowers, rendering them monoecious. On the other hand, date palms are dioecious. The concepts of monoecy and dioecy are closely related to evolutionary processes. According to Cronk (2022), the occurrence of monoecy where male and female flowers exist on the same plant, is considered to be the primary pathway leading to dioecy, which involves the division of sexual functions among distinct individuals. According to a recent study by Muyle et al. (2021), the transition from monoecy to dioecy can be achieved through mutations in a single gene. This evidence indicates that oil palm might be a transitional form between coconut and date palm.

Crucial amino acid residues

The MUSCLE alignment technique used to identify crucial amino acid residues in the PEBP family displayed notable variations within the PEBP domain alignment of the 27 proteins (Fig. 3). The importance of tyrosine residues in the structure of the FT/Hd3a protein was observed based on the findings of Hanzawa et al. (2005) (Fig. 3, box a). Hanzawa et al. conducted a study in which a tyrosine residue was substituted with a histidine in AtFT1, resulting in enhanced protein activity and decreased flowering, similar to the effects observed when AtTFL1 was overexpressed. A consistent pattern was observed in segment A (Fig. 3, box a), wherein all members of the FT subclade 1 exhibited the presence of tyrosine, which promotes flowering. In contrast, the majority of the proteins in the FT subclade II exhibited histidine residues resembling TFL proteins, except for XP_010912140, which retained a tyrosine residue. All MFT proteins contained tryptophan at this specific site, indicating potential functional divergence due to single amino acid variations. Segment B (Fig. 3, box b) displayed consistent amino acid sequences in all FT proteins, which is consistent with the findings of Ahn et al. (2006). In the FT subclade I, a specific region contained 13 of 15 identical amino acids, indicating a possible conserved region.

Fig. 3
figure 3

Complete Protein Sequence Alignments of reference protein sequences and all PEBP proteins in the palm. Horizontal arrows indicate the position of the PEBP domain. Block a indicates tyrosine residues, according to Hanzawa et al. (2005). Block b indicates the collection of fourteen amino acid sequences reported by Ahn et al. (2006). Block c shows the regions corresponding to the LYN triad, as reported by Ahn et al. (2006). These characteristics are utilized to discern proteins exhibiting FT/Hd3a functionality

In contrast, subclade II demonstrated less uniformity in this region. Segment C (Fig. 3, box c), which contained a distinct arrangement of LYN amino acids in FT proteins, but was not found in TFL or MFT proteins, was observed in both subclades of FT proteins, except for XP_010912140. The conservation of segments B and C is important due to their role in forming a helix on the protein surface, which contributes to the formation of the central β-sheet. Segment C exhibited greater homology between TFL1 and FT homologs compared to segment B. However, a particular triad within FT remained predominantly consistent, distinguishing it from TFL1. This emphasizes the difficulty of modifying segment C and its crucial significance in protein structure and function (Ahn et al. 2006).

Modeling the protein structure

In this study, we employed the SWISS-MODEL platform to predict the structure of PEBP proteins, specifically those belonging to the FT subclades I, II, and TFL. To validate the accuracy of these models, we conducted a comparative analysis between the predicted structures and AtFT, a widely recognized protein structure (Table 2). The amino acid sequence similarity within FT-like subclade I varied from 70.83% to 73.4%, while the GMQE values ranged from 0.84 to 0.87. The GMQE scores for subclade II, similar to FT, ranged from 0.82 to 0.87, corresponding to sequence similarities ranging from 66.27% to 71.17%. The local quality estimates provided additional support for these results, as all amino acid residues scored above 0.6, indicating their suitability for the modeling process (Santhoshkumar and Yusuf 2020).

Table 2 PEBP protein modeling of FT-like subclade I, FT-like subclade II, TFL-like, and MFT-like proteins compared to AtFT proteins

The high GMQE values and local quality estimates indicated structural similarity between the FT subclades I and II, which was also validated by protein modeling (Fig. 4a–c). The presence of variations in amino acid residues within the three crucial conserved segments of these proteins suggests possible functional differences. Functional variations among homologous genes are frequently observed across a wide range of species. Mulki et al. (2018) discovered that the Flowering Locus T3 in barley, which is associated with the FT gene, triggers spikelet growth but does not stimulate flowering. In contrast, Pieper et al. (2021) demonstrated that the Flowering Locus T4 could postpone the flowering process in barley. These examples demonstrate the functional diversity that arises from variations in homologous gene families. Protein modulation mechanisms provide an explanation for changes in protein function that do not involve structural changes. One notable mechanism is allosteric modulation, in which the binding strength of a ligand at one site (allosteric site) influences the interaction of a different molecule at a distant site (active site) (McLeish et al. 2015). This process involves conformational changes in the protein structure, allowing it to respond dynamically to environmental cues or signaling molecules.

Fig. 4
figure 4

Predicted protein structure models for a AtFT, b XP_010925712 EgHd3a, c XP_010912140 EgHd3a, d AtTFL1, and e XP_010907126 EgSP were generated using SWISS-MODEL. A template search was performed against the SWISS-MODEL template library through Blast and HHblits. Blue color denotes regions relatively similar to the reference proteins, while red indicates significant differences. (Color figure online)

The GMQE values for the TFL members ranged from 0.85 to 0.87, with sequence similarities ranging from 64.33% to 73.10%. The analyzed protein structures exhibited a high degree of similarity to the reference structure AtTFL1, as indicated by the GMQE scores. Further analysis of the local quality estimate values revealed notable structural disparities in certain amino acid sequences, specifically at amino acid residues 33–36 and 134–138 (segment B). In these regions, the values dropped significantly below the threshold of 0.6, indicating significant structural deviations from the reference validated by protein modeling (Fig. 4d, e). These results confirm that either segment (B or C) is adequate for TFL activity (Ahn et al. 2006).

In silico expression analysis

In silico expression analysis showed that the groups formed based on the expression patterns of genes (Fig. 5) were different from those formed based on the RNA sequences (Fig. 1a). For example, the gene encoding XP_010911427 (EgHd3a) exhibited the highest expression levels in mature leaves, whereas its close relative in coconut (KAG1367690) showed high expression levels in male flowers. Data from four different tissues showed that three genes (XP_010911427, KAG1366211, and XP_010912140) were highly expressed in mature leaves, but their expression was very low in flowers. Six genes (XP_010919262, XP_010936814, XP_010912272, XP_010940015, KAG1331310, and XP_010930170) were highly expressed in the roots, whereas the other genes showed low expression. In addition, eight genes were highly expressed in the flowers, indicating their potential role in the reproductive phase to fertilization. XP_010907126 showed very low or no expression. As the orthologs of rice Hd3a proteins, they may function in determining the initiation of flowering and panicle development (Zhao et al. 2015; Endo-Higashi and Izawa 2011). On the other hand, as the FT-antagonizing proteins, TFLs competed with Hd3a for complex formation to initiate or inhibit flowering. TFLs also redundantly functioned in inflorescence meristem development (Kaneko-Suzuki et al. 2018; Yoo et al. 2010). In addition, MFT induced earlier flowering in overexpressed Arabidopsis mutants (Yoo et al. 2003). Interestingly, MFT was highly expressed in seeds, especially in mid- to late-seed development in Jatropha curcas (Tao et al. 2014). The difference in expression patterns compared to nucleotide sequence patterns probably occurred because the data obtained from the NCBI database were not taken at a specific time, even though the timing of the initiation of flower development occurs precisely for these flowering genes. Most PEBP genes exhibit diurnal oscillation patterns (Yoo et al. 2010).

Fig. 5
figure 5

In silico heatmap expression profiling of PEBP genes in coconut and oil palm across seedling leaves, mature leaves, root, male flower, and female flower. Heat map color: red related to high expression, yellow to medium expression, and green to low expression. (Color figure online)

Conclusion

This study significantly enhances our understanding of the genetic and evolutionary dynamics of the PEBP gene family in Arecaceae, particularly among oil palm, coconut, and date palm. It revealed a closer genetic affinity between oil palm and coconut than between oil palm and date palm, suggesting distinct evolutionary paths. This study categorizes PEBP genes into three main groups: FT, TFL, and MFT, and notes key structural similarities and functional differences within these groups. In silico expression analysis revealed that gene expression patterns differed from those of the RNA sequences. The difference in expression patterns may be due to the data not being obtained at specific times. Structural and sequence variations in these proteins may have important implications for their functional roles, highlighting the intricate relation between protein structure and function in plant biology. This comprehensive analysis of both protein and mRNA sequences offers valuable insights into the intricate nature of genetic evolution, highlighting the importance of gene duplication and diversification in these economically and ecologically significant palm species and paving the way for future research in plant biology and genetics.