Introduction

Here, we examined the evolution of the heat shock transcription factor (HSF) family in angiosperms. HSFs are the terminal components of the signal transduction chain that activates genes in response to various inducers, including high temperatures, oxidants, heavy metals, and pathogens (Swindell and others 2007; Zhang and others 2013a). They are the central regulators of the expression of heat shock proteins (HSPs) and other heat shock-induced genes that confer thermotolerance to all eukaryotes. Heat shock proteins (HSPs) are often considered to function as a protective umbrella and act as molecular chaperones, which were first discovered in cells exposed to heat stress (Wang and others 2012). These proteins prevent protein unfolding and aggregation to help maintain cellular protein homeostasis, which determines critical cellular structures and functions to regulate stress response (Scharf and others 2012). The expression of HSPs is controlled and regulated by specific types of transcription factors called heat shock transcription factors HSFs, which normally exist as inactive proteins (Zhang and others 2015). Similar to many other transcription factors, the HSF family has a modular structure. Reports have shown that HSFs are composed of an N-terminal DNA-binding domain (DBD), characterized by a central helix-turn-helix motif that specifically binds to the heat shock elements (HSE) in the target promoters, and an adjacent bipartite oligomerization domain (HR-A/B) composed of hydrophobic heptad repeats (Giorno and others 2012; Song and others 2014). In addition, most HSFs also include clusters of basic amino acids essential for nuclear localization signal (NLS), nuclear export signal (NES), and a less conserved C-terminal activator domain (CTAD) rich in aromatic, hydrophobic, and acidic amino acids, the so-called AHA motifs (Huang and others 2015; Lin and others 2014b). In contrast to the small number of HSF genes found in Drosophila, Caenorhabditis elegans, yeasts, and animals, the HSF system is more complex in plants than in any other organism investigated thus far (Clos and others 1990; Sorger and Pelham 1988).

Genome duplication (polyploidy) is common in many lineages of the angiosperms (Adams and Wendel 2005). Many modern diploid plants harbor evidence of multiple rounds of past polyploidization events, often followed by massive gene loss of duplicated genes. Estimates for the incidence of polyploidy in angiosperms vary from 30 to 80 %, and 2 to 4 % of speciation events can be attributed to genome duplications (Otto 2007). Ancient duplication events have affected the number of genes in various gene families. Comparisons between monocotyledons and dicotyledonous plants have shown that genomes have undergone whole-genome duplication (Jaillon and others 2007). This older shared polyploidy event is estimated to have occurred 130–240 million years ago (Mya) (Blanc and Wolfe 2004). All modern angiosperms originate from a common hexapolyploid ancestor and among these it includes numerous closely related relationship species (Kellogg 2001). This closely related relationship has been demonstrated in comparative genomics studies, which shows that they are highly conserved in gene arrangement and order (Paterson and others 2000).

In recent years, studies on the HSF family had been thoroughly characterized in Arabidopsis, rice, poplar, Chinese cabbage, maize, and Malus (Giorno and others 2012; Guo and others 2008; Lin and others 2011; Mittal and others 2009; Song and others 2014), whose genomes have been sequenced. This provided a convenient opportunity for us to analyze the evolution and expansion of the HSF gene family in four model angiosperms, namely Populus trichocarpa, Arabidopsis thaliana, Oryza sativa, and Vitis vinifera, for which substantial information about genomes or transcriptomes was available. The aim of this study was to determine which genes were derived from genome duplication, subsequently giving rise to paralogs, which are genes descended from speciation events, giving rise to orthologs. We searched for microsynteny within and between the genomes of the four model angiosperms to investigate the evolutionary history of the HSF regions. We further analyzed the evolutionary rate and purifying selection of the genome segments of HSF genes, which provided a clue to the evolution of the HSF gene. The goal of this study was to ask whether this model of protein-coding gene family evolution applies to the HSF gene family, and, if so, whether there is an association between the evolution of HSF genes and changes in expression patterns that might indicate diversification of function.

Materials and Methods

Database Search and Sequence Retrieval

Sequences of Arabidopsis and rice HSFs were downloaded from the Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org/, release v10.0) and rice annotation database (http://rice.plantbiology.msu.edu/, release v5.0). We performed BLAST (Basic Local Alignment Search Tool algorithms) searches against the poplar and grapevine genome database using 22 of the known Arabidopsis full-length HSF protein sequences as queries, and then using the resulting poplar and grapevine sequences as secondary queries. Redundant sequences were then removed manually, and all candidate sequences were further analyzed to confirm the presence of conserved Hsf-type DBD domain and HR-A/B regions using Pfam (Bateman and others 2004) and Smart database (Letunic and others 2004). Information on sequence ID, ORF lengths, protein sequences, genomic sequences, CDS sequences, and location coordinates was obtained from Phytozome (http://www.phytozome.net) (Goodstein and others 2012).

Physical parameters of the predicted HSF proteins including length, molecular mass (kDa), and isoelectric point (pI) for each gene were calculated with the online ExPASy programs (http://web.expasy.org/protparam/) (Gasteiger and others 2003).

Phylogenetic Analysis, Gene Structure and Conserved Motif Analysis

Multiple sequence alignment of all predicted HSF protein sequences was performed with Clustal X2.0 software using default parameters (Horiuchi 2009). To understand the evolutionary relationships between poplar, rice, Arabidopsis, and grapevine, a phylogenetic tree was constructed using the neighbor-joining (NJ) and the maximum parsimony (MP) methods in MEGA 6.0 (Tamura and others 2013). Bootstrap analysis was conducted with 1000 replicates for each node. An unrooted NJ tree of 87 full-length HSF protein sequences from poplar, rice, Arabidopsis, and grapevine was constructed. The HSF genes were classified based on their phylogenetic relationships with the corresponding Arabidopsis and rice HSF genes.

The exon/intron organization was identified using the Gene Structure Display Server (GSDS; http://gsds.cbi.pku.edu.cn/) (Guo and others 2007) by alignment of the cDNA sequences with the corresponding genomic DNA sequences. The conserved structural motifs were analyzed using the MEME online program (http://meme.sdsc.edu/meme4_4_0/intro.html) (Durrant-Whyte and Bailey 2006). The parameters were as follows: number of repetition = any; maximum number of motifs = 30; and with an optimum motif length of between 6 and 200 residues. In addition, structural motif annotation was performed by Pfam (http://pfam.sanger.ac.uk/search) and SMART tools (Letunic and others 2004).

Intraspecies Microsynteny Analysis

To categorize the expansion of the HSF genes, tandem duplication of multiple gene family members was examined within either the same or neighboring intergenic regions (Zhang and others 2013b). First, gene pairs (A1 and A2) in each family were selected as the original anchor points representing those that belong to the same gene families. The genes were located within 100 kb from each other (Wang and others 2010). Eventually, we calculated the total number of protein-coding genes flanking the anchor point with a best non-self match (E value < 10−10) with a protein-coding gene flanking the other anchor point. We judged a large-scale duplication event as follows: where two HSF genes were detected having four or more of 15 up/downstream flanking genes, we considered the two regions to be conserved.

Microsynteny Analysis

Based on comparison with the independent regions, multiple HSF genes were subjected to microsynteny analysis across poplar, Arabidopsis, rice, and grapevine using Circos-0.54 (Chen and others 2015). Each HSF gene was used as an anchor point. 15 protein-encoding sequences up/downstream of the anchor points were used to identify duplicated genes in the two specific regions. A syntenic block was defined as a region in which three or more conserved homologs (BlastP, E value < 10−20) were located within a 100-kb region between the genomes (Sato and others 2008). When three or more such gene pairs with syntenic relationships were detected, the two regions were considered to have originated from a large-scale duplication event.

Duplication Event Dating and Evolution Analysis

Using each duplicated block regions among duplicated gene pairs, we calculated the mean Ks values and Ka/Ks ratios for individual homologs within the flanking conserved genes among homologous genes by DnaSP (version 5.10) (Librado and Rozas 2009). A sliding window analysis of the Ka/Ks ratios was performed with the following parameters: window size—150 bp and step size—9 bp. The Ks values were often used as proxy to date large-scale duplication events (Hu and others 2010; Lin and others 2014a). To estimate absolute dates for the large-scale gene duplication, we estimated the ages of the modes of the secondary peaks of gene duplication, assuming clock-like rates of synonymous substitution of 6.5 × 10−9 substitutions/synonymous site/year for rice, 1.5 × 10−8 substitutions/synonymous site/year for Arabidopsis, and 9.1 × 10−9 substitutions/synonymous site/year for poplar (Blanc and Wolfe 2004; Ma and others 2014). The duplication time (T) was calculated as T = Ks/2λ × 10−6 Mya (for example, λ = 6.5 × 10−9 for rice).

Microarray Data

Whole-genome microarray data for various tissues of poplar were acquired from the Gene Expression Omnibus (GEO) database (Barrett and Edgar 2006b) at the National Center for Biotechnology Information (NCBI) under accession number GSE13990. The Affymetrix microarray data originated from eight different tissue samples representing three biological replicates (Wilkins and others 2009). Probe sets corresponding to the putative PtHsf genes were identified using an online Probe Match tool available at the NetAffx Analysis Center (http://www.affymetrix.com/). The probe with the highest match value was employed when genes had more than one probe set. Based on Pearson coefficients with average linkage in the Genesis (version 1.75) program (Sturn and others 2002), the expression data were normalized and hierarchical clustered (Chai and others 2012). Followed by log transformation and average calculation, and then we saved them as tab-delimited files which were imported into Cluster (version 3.0) to generate heatmaps (Sturn and others 2002).

Plant Materials, Growth Conditions and Stress Treatments

For RNA isolation, four-week-old poplar seedlings were grown in the greenhouse (at continuous temperature 25 ± 2 °C, and 14/10 h—light/dark photoperiod). For different stress treatments, four-week-old uniform size seedlings were transferred to a temperature-controlled environment (heat stress, 42 ± 1 °C; cold stress, 4 ± 1 °C). For oxidative stress treatments seedlings leaves were sprayed with 10 mM H2O2 solution (Miller and others 2008). The seedling leaves were collected at 0 and 1 h after each treatment. After harvest, they were immediately frozen in liquid nitrogen and stored at −80 °C prior to further analysis. Three technical replicates were taken in each biological replicate.

Quantitative Real-Time PCR Analysis

Trizol reagent (Invitrogen, USA) was used to extract total RNA from about 0.1 g of leaves. Residual genomic DNA was removed using DNase I. Total RNA was analyzed on a 2 % agarose gel and quantified by electrophoresis. cDNAs were synthesized using the PrimerScript RT Master mix (RR036A, TaKaRa) and 1 μg of total RNA following the manufacturer’s instructions. Alternatively, gene-specific primers were designed for 30 PtHsf genes with Primer 5.0 and then checked with NCBI (Table S1). Reactions were performed in a total volume of 20 μl containing 10 μl of 2X SYBR Premix Ex TaqTM (TaKaRa, Japan), 0.8 μl of each gene-specific primer, 0.4 μl of 50X ROX Reference Dye, 6.0 μl of ddH2O, and 2.0 μl diluted cDNA template. The PCR conditions were as follows: an initial denaturation of 95 °C for 30 s, 40 thermal cycles of denaturation at 95 °C for 5 s, and annealing at 55–60 °C for 34 s. We used three biological replicates (poplars from different lots) and three technical replicates (replicate poplar from the same lot) for each sample (Guidone and others 2016). The relative expression level for each gene was described as \( 2^{{ - \Delta \Delta {\text{C}}_{\text{T}} }} \) level in comparison to that of the untreated control plant which was normalized to 1 (Livak and Schmittgen 2001). Statistical analyses were performed by GraphPad Prism 5.0 (Bryfczynski and Pargas 2009).

Results

HSF Genes in Four Model Angiosperms

Most angiosperms fall into two large clades, monocotyledoneae and dicotyledoneae. The four model angiosperms, monocotyledons (O.sativa) and dicotyledons (P. trichocarpa, A. thaliana, and V. vinifera), originate from a common paleohexapolyploid ancestor. Based on nomenclature proposed in a previous study, we obtained 22, 25, and 30 putative HSF genes from the Arabidopsis, rice, and poplar, respectively. Due to poplar genome database updates, some changes were found with the poplar sequences. These are shown in Table S1. To identify 10 HSFs in grapevine, we performed a search against the genome database with BlastP using AtHsf protein sequences as queries (Table 1, Table S2). The details of information about amino acids (length), chromosomal location, molecular weight (Mw), and isoelectric point (PI) of each HSF genes are listed in Table S2.

Table 1 Number of HSF genes in Arabidopsis, rice, grapevine, and poplar

From the results, the number of HSF genes in poplar, Arabidopsis, and rice was more than grapevine. Therefore, grapevine may have lost HSF genes faster than other plants, or it may have expanded slower in its evolutionary process.

Phylogenetic and Structural Analysis of the HSF Genes

To investigate the evolutionary organization among the HSF gene family, an unrooted phylogenetic tree of 30 poplar HSFs, 22 Arabidopsis HSFs, 25 rice HSFs, and 10 grapevine HSFs was performed with 1000 bootstrap replicates (Fig. 1a). The phylogenetic tree was constructed using MEGA 6.0 by employing the neighbor-joining (NJ) and maximum parsimony (MP) methods, respectively. The tree topologies produced by the two algorithms were largely comparable with only minor modifications at interior branches (data not shown). Therefore, only the NJ phylogenetic tree was subjected to further analysis in our study (Fig. 1a). Phylogenetic analysis showed that the HSF genes from the four model angiosperms could be grouped into three classes that corresponded to the classes A, B, and C. It also appeared that the numbers of poplar and Arabidopsis class A proteins were higher than other species. These represented HSF genes which had specifically expanded within the poplar and Arabidopsis genomes. For instance, there were seventeen and sixteen HSF Class A genes in poplar and Arabidopsis, respectively. Based on the nomenclature in previous studies of Arabidopsis and rice, Class A and Class B were subdivided into eight (A1–8) and five (B1–5) clades according to their bootstrap values and phylogenetic relationships (Fig. 1a). In our analysis, AtHsf-19 (HSFA6a) did not cluster in clade A6 but grouped in clade A2. In addition, poplar, Arabidopsis, and grapevine only had a single HSF gene in Class C, whereas rice had four HSF genes in Class C. Thus, the Class C HSF genes appeared to have undergone expansion in the monocotyledons.

Fig. 1
figure 1

Phylogenetic relationships and exon/intron organization of full-length HSF proteins from poplar, Arabidopsis, rice, and grapevine. a Unrooted phylogenetic tree generated with MEGA6.0 using the full-length predicted amino acid sequences of 87 HSF genes by the neighbor-joining (NJ) method with 1000 bootstrap replicates. Subfamilies and subclasses of HSF genes (A–B) are highlighted with different colored backgrounds and vertical bars next to the gene names on the tree. b Exon/intron organization of HSF genes. Yellow boxes represent exons and gray lines represent introns, and untranslated regions (UTRs) are indicated by blue boxes. The sizes of exons and introns can be estimated using the scale at the bottom (Color figure online)

We then compared the exon/intron organization in the coding sequences of HSF genes. Most closely related HSF members in the same subfamily shared very similar gene structure in terms of each intron numbers or exon lengths. For example, all of the poplar, Arabidopsis, and rice HSF genes had two exons and one intron, except for PtHsf-14, PtHsf-24, OsHsf-08, OsHsf-09, OsHsf-24, AtHsf-01, and AtHsf-14 added one exon and intron and PtHsf-14 lost one exon and intron during the long evolutionary period (Fig. 1b). However, in grapevine, the number of introns varied from 1 to 11, whereas the major members had one to three introns.

We further examined the exon/intron organization of 20 paralogous pairs in these four species to investigate traceable intron gain or loss within these. Among these, only two paralogous pairs (OsHsf-08/-21 and PtHsf-16/-20) showed any difference. These differences might be derived from the exon/intron loss or gain event.

We used MEME motif detection software to reveal the diversification of HSF genes (Fig. 2). The details of the 30 putative motifs are shown in Table S3. Each of the putative motifs was annotated by searching Pfam and SMART (Finn and others 2013; Letunic and others 2012). The most common motif was motif 1. As expected, most of the closely related members had common motif composition, suggesting functional similarities among HSF proteins within the same subfamily.

Fig. 2
figure 2

Distribution of conserved motifs in the HSF family members. All motifs were identified by MEME using the complete amino acid sequences of poplar, Arabidopsis, rice, and grapevine HSF genes. Subfamilies and subclasses (A1–C) of HSF proteins are highlighted with colored backgrounds and colored vertical bars as in Fig. 1a to the right of the gene names. Different motifs are indicated by different colors numbered 1–30 (Color figure online)

Ongoing Individual Gene Duplication

To examine genetic divergence within each HSF family and the expansion patterns, we further studied gene duplication events in these four model angiosperms (Table 2). The angiosperm genomes had undergone at least two rounds of whole-genome duplication events during evolutionary history (Tuskan and others 2006). Gene duplication, including segmental duplication and tandem duplication, is considered to be one of the primary driving forces in the evolution of genomes (Moore and Purugganan 2005). First of all, we analyzed the adjacent genes to determine whether tandem duplication had taken place. A pair of genes separated by five or fewer genes within a 100-kb region on a chromosome might have resulted from tandem duplication. According to this criterion, no pair was found to be generated by tandem duplication. In addition, to investigate larger-scale duplication possibilities, we searched for gene similarity in the HSFs flanking regions. If four or more of the 15 up- and downstream genes flanking two HSF genes achieved a best non-self match using BLASTP (E value < 10−10), we considered these gene pairs to be conserved and defined these two regions as derived from a large-scale duplication event (Li and others 2014). Two or three flanking genes were considered flexible sets. Eighteen paralogous gene pairs (31 genes) were localized to segmentally duplicated regions (Table 2). In poplar, 22 out of 30 HSF genes were arranged in large-scale duplication. Each gene pair had conservation between their flanking genes. However, only a few HSF-containing segments had matched duplicates in Arabidopsis and rice, that is, OsHsf-20/-22, OsHsf-10/-19, OsHsf-12/-18, OsHsf-08/-25, and AtHsf-15/-20. The observation that multiple copies of HSF were retained in poplar is reasonable from an evolutionary perspective because HSFs confer various abiotic and biotic resistance traits to plants (Ogawa and others 2007; Sakuma and others 2006). Paralogs produced from the recent duplication genes and the ancient duplication in poplar. In the populus HSF duplicated network, when two duplicated genes from recent duplication could not be found simultaneously, we reasoned that a possible ancient gene loss event occurred. Previous reports (Guo and others 2014; Tang and others 2008) indicated that the grapevine genome had undergone a γ event, in theory, an ancestor gene would be copied into three genes. However, in the VvHsf gene family, we found no paralogs between the 10 VvHsf genes (Table 2). Thus, we suggested that many copies of HSF genes in grapevine may have been lost during the long-term evolutionary process.

Table 2 Estimates of the dates for the large-scale duplication events in poplar, rice, and Arabidopsis

Dating Duplication Events

Under the assumption that synonymous silent substitutions per site (Ks) occur with a constant rate over time, we could use the conserved flanking protein-coding genes to estimate the dates of the larger-scale duplication events (Blanc and Wolfe 2004). For this analysis, we used duplicated blocks to date duplication events in our strict set only (excluding flexible sets). Each pair of proteins in the duplicated block was aligned at the amino acid level, and then codons from gapless aligned regions were used to calculate Ks values. We deleted any Ks values above 2.0 because of the risk of saturation (Maher and others 2006). We showed the mean Ks values for duplication and the estimated date in Table 2. For instance, based on the substitution rate of 9.1 × 10−9 substitutions per site per year previously proposed for poplar (Ma and others 2014), the duplication events for the 13 paralogs were estimated to have occurred between 8.16 Mya (million years ago) and 31.45 Mya. Additionally, PtHsf-19, PtHsf-10, and PtHsf-05 were all related via the ancient genome duplication, but the relationships between PtHsf-17 and PtHsf-10 or PtHsf-05 were uncertain. Therefore, PtHsf-17 might be a product under a segmental duplication of PtHsf-19. These results supported that two whole-genome duplications played a key role in the expansion of the poplar HSF family.

Conserved Microsynteny Analysis

Using a stepwise gene-by-gene reciprocal comparison of the regions hosting the HSF genes, we observed strongly conserved microsynteny among Arabidopsis, rice, grapevine, and poplar (Fig. 3, Table S4). To provide a basic framework for identified the HSF, 87 HSF genes including 30 poplar HSFs, 22 Arabidopsis HSFs, 25 rice HSFs, and 10 grapevine HSFs, were divided into fifteen distinct groups (Fig S1, a–c), based on phylogenetic analysis. Each group contained at least one gene from poplar, Arabidopsis, rice, and grapevine, indicating that these different species maybe derived from a common ancestor. Therefore, we referred to these 15 distinct groups as orthologous groups.

Fig. 3
figure 3

Extensive microsynteny of HSF gene-containing regions across poplar, Arabidopsis, rice, and grapevine chromosomes. Rice chromosomes (Os), are indicated by pink boxes. The poplar, Arabidopsis, and grapevine chromosomes are labeled Pt (green–brown), At (blue), and Vv (yellow), respectively. Numbers along each chromosome box indicate sequence lengths in megabases. All chromosomes for these four species harboring Hsf regions are shown arranged in a circle. The syntenic relationships within HSF Classes A, B, and C are represented by red, black, and green lines, respectively (Color figure online)

In each orthologous group, high levels of microsynteny were maintained between the members of three plants (Arabidopsis, grapevine, and rice) and networks of duplicated regions in poplar, each anchored by the HSF gene. Segments of different species were thought to have shared the ancient whole-genome duplication that occurred outside of the papilionoid lineage within an orthologous group. Two representative synteny diagrams of these groups were shown in groups A4 and B2 (Fig. 4). In group A4, PtHsf-09/OsHsf-04 and AtHsf-20/AtHsf-15 exhibited remarkable opposite-direction microsynteny, whereas PtHsf-09/VvHsf-08 and PtHsf-04/AtHsf-20 were aligned with flanking gene pairs in the same order but discordant transcriptional orientation. A high level of microsynteny existed in group B2, with most pairs in reverse order, especially PtHsf-23/AtHsf-22, OsHsf-24/OsHsf-14, and AtHsf-11/VvHsf-05 were identified as having opposite-direction microsynteny. In particular, the duplicated pair PtHsf-23/PtHsf-27 had highly detectable linkage with each other.

Fig. 4
figure 4

Comparative maps of representative HSF genes and their flanking genes with syntenic. The relative positions of all flanking protein-coding genes are defined by the anchored HSF genes (black). Arrows represent individual genes and their transcriptional orientations. The relative positions of all flanking protein-coding genes are defined by the anchored HSF genes. Conserved gene pairs among the segments are connected with lines. (A4) The syntenic chromosomal intervals containing PtHsf-09, OsHsf-04, AtHsf-20, AtHsf-14, PtHsf-22, and VvHsf-08 across poplar, Arabidopsis, rice, and grapevine. (B2) The syntenic chromosomal intervals containing PtHsf-01, PtHsf-27, PtHsf-23, AtHsf-22, OsHsf-24, OsHsf-14, AtHsf-11, VvHsf-05, and VvHsf-06. The full microsynteny maps of regions containing HSF genes within poplar, Arabidopsis, rice, and grapevine are shown in Fig. S1

To estimate the extent of conserved gene content and order, synteny quality was assessed. The average synteny quality of the HSF genes across the four plants was 15.31 % (Table S4). Due to the large number of syntenic genes shared between poplar and grapevine, the synteny quality between these genomes was 23.59 %, which was higher than that observed for the At/Vv synteny blocks (16.88 %). The lowest synteny quality (9.99 %) was found between rice and Arabidopsis. Details of these comparative analyses are shown in Table S4. These results supported the orthology of the segment groups used in this study. The extremely low linear relationship among four model angiosperms also suggested that most of HSF genes in each species are descendants of different ancestral genes (Table S4).

Strong Purifying Selection for HSF Genes in Angiosperms

To better understand the evolutionary history of the HSF gene family in angiosperms, we measured the Ka/Ks ratios of HSF paralogs in the network of duplicated regions of poplar, Arabidopsis, and rice. Grapevine was excluded from this analysis due to the lack of HSF paralog pairs. Interestingly, the resulting data showed that all paralog pairs have Ka/Ks << 1 (Table S5, Fig. 5), indicating that the HSF family mainly evolved under purifying selection. To obtain further insight into the selection pressure, we performed a sliding window analysis of Ka/Ks between each pair of HSF paralogs. As expected from the basic Ka/Ks analysis, sliding window analysis clearly showed that numerous sites/regions were under moderate to strong negative selection (Fig. 6). However, there were still some conserved regions of HSFs subjected to positive selection, with Ka/Ks ratios above 1 (for example, PtHsf-17/-19 and OsHsf-10/-19).

Fig. 5
figure 5

Estimates of Ka/Ks ratios in pairwise comparisons. The Ka/Ks ratios of the duplicated HSF genes and their flanking paralogs in poplar, Arabidopsis, rice, and grapevine are shown in the scatter plots; the y and x axes denote the Ka/Ks ratio and synonymous distance for each pair, respectively

Fig. 6
figure 6

Sliding window plots of duplicated HSF genes. The window size is 150 bp, and the step size is 9 bp. The same subfamily is highlighted with same colored backgrounds

The results showed that HSF family in angiosperms were under purifying selection, and these genes were under strong evolutionary constraints so as to maintain its stable function. Additionally, a few conserved coding regions in the HSF gene family underwent positive selection, which indicated some new function in these HSF genes.

Expression Patterns of HSF Genes in Poplar

To gain insight into the possible functions of HSF genes, we re-analyzed the information on expression of poplar HSF genes from microarray data (Fig. 7). We investigated probe sets for 28 of the 30 in the microarray data downloaded from NCBI (accession number GSE13990). The expression data for two genes (PtHsf-15 and PtHsf-18) were not detected in the platforms. Duplicated genes often have different evolutionary fates, that is, nonfunctionalization, neofunctionalization, or subfunctionalization, which maybe indicated with divergence in their expression patterns (Wang and others 2015b). From these results, we determined that there were nine paralogous gene pairs in the poplar HSF gene family. As shown in Fig. 7, most pairs of paralogs in poplar shared similar expression patterns and thus showed functional redundancy. Seven paralogous pairs (PtHsf-22/-09, PtHsf-04/-29, PtHsf-07/-02, PtHsf-20/-16, PtHsf-10/-05, PtHsf-08/-21, and PtHsf-06/-25) derived from segmental duplication shared almost identical expression patterns with respect to different tissues and various stresses. In contrast, two paralogous pairs (PtHsf-19/-17 and PtHsf-23/-27) shared dissimilar expression patterns. For example, PtHsf-19 showed the highest transcript abundances in female catkin, whereas its duplicated counterpart PtHsf-17 was preferentially expressed in xylem. The specific and varied expression profiles of HSF genes in poplar tissues suggested that they played diverse and novel roles, which was in agreement with the results of phylogenetic and protein motif analysis.

Fig. 7
figure 7

Expression profiles of HSF genes in poplar across different tissues. Heatmap showing hierarchical clustering of 28 PtHsf genes across different tissues analyzed, which exhibiting their obvious tissue-specific expression patterns. Color scale erected vertically at the right side of the picture represents log2 expression values, blue represents low level and red indicates high level of transcript abundances. Microarray data under the series accession number GSE13990 were obtained from NCBI GEO database. Two connected genes in same color indicate duplicated gene pairs. S seeding; ES etiolated seedling; YL young leaf; R root; DX xylem; ML mature leaf; FC female catkins; MC male catkins (Color figure online)

Previous studies have shown that nine PtHsf genes respond to abiotic stresses including heat, NaCl, and CdCl2 (Zhang and others 2013a). To further investigate the function of the HSF gene family and their duplicated genes under different abiotic stress conditions, we performed qRT-PCR analysis of 30 HSF genes in the response to abiotic stress using RNA isolated from the leaves of 4-week-old poplar subjected to heat, cold, and H2O2 stress treatments (Jian and others 2008). The qRT-PCR results showed that a total of 30 genes were clearly up- or down-regulated compared to the controls (>two-fold or <0.5-fold) in at least one of the stress conditions examined (Fig. 8). By comparing the expression patterns of 20 segmental duplicated genes, we found that nine duplicated gene pairs shared similar expression pattern under three stress treatments. Among these genes, most were responsive to more than one stress treatment. For example, PtHsf-15 was significantly up-regulated by all three stresses. Under heat treatment, all paralogous gene pairs showed similar up- or down-regulation, except for PtHsf-07/-02. PtHsf-02 was dramatically up-regulated, whereas PtHsf-07 was conspicuously down-regulated. Under cold treatment, only two duplication gene pairs (PtHsf-03/-18 and PtHsf-11/-28) had different expressions; PtHsf-03 was up-regulated and PtHsf-18 was down-regulated. Under H2O2 treatment, the expression profiles of nine duplicated gene pairs were similar, whereas PtHsf-08/-21 was different. PtHsf-08 was slightly up-regulated, by contrast, PtHsf-21 was obviously down-regulated. It is noteworthy that seven genes (PtHsf-02, PtHsf-05, PtHsf-10, PtHsf-14, PtHsf-15, PtHsf-17, and PtHsf-19) strongly responded to heat stress in our experiments, indicating that they could have important roles in the heat shock regulatory network.

Fig. 8
figure 8figure 8

Expression of poplar HSF genes in response to abiotic stress measured by quantitative real-time PCR. The mRNA level of each gene in the young poplar leaf is given for heat (HS: 42 °C), cold (4 °C), and oxidative (OS: 10 mM H2O2) stress (1 h treatments) and plotted relative to the value obtained for the unstressed control. Error bars represent standard errors. Duplication gene pairs among same subfamily are highlighted with same colored backgrounds

Discussion

In this study, we performed a genome-wide comparison of HSF members from eudicots (P. trichocarpa, A. thaliana, and V. vinifera) and monocots (O.sativa) to reconstruct the gene gain/loss in angiosperms. The phylogenetic tree was quite informative for inferring the number of HSFs in the most recent common ancestor of the four species analyzed in this study (Nam and others 2004). Based on phylogenetic tree analysis, the HSF genes from the four species could be delineated into 14 well-supported ancient gene lineages, although the representatives of one or two species are missing from some clades. Consequently, this analysis revealed that there were at least 14 HSF genes in the most recent common ancestor among these four species (Nam and others 2004). However, there was a large amount of variability in each clade indicating lineage-specific gene gain and loss (Zhou and others 2000). Except for the shared clade representing Class C HSFs, all of the clades were confirmed by the following interspecies microsynteny analysis (Wang and others 2015a).

Phylogenetic analysis of the HSF gene family revealed that the exon/intron structures and motif compositions of these genes are relatively conserved (Chen and others 2014; Feng and others 2014). Most closely related genes within the same subfamily shared similar gene structures. However, some motifs with unknown functions were present in nearly every subgroup, and these motifs might play important roles in the HSF family.

For large-scale duplication, we observed paralogous gene pairs in each species and used the level of synonymous nucleotide substitution to estimate the relative ages of gene duplication (Li and others 2014). We identified 22, eight, and two duplicated genes in poplar, rice, and Arabidopsis, which accounted for 73.3, 32, and 9 % produced from the duplicated genes and total genes, respectively (excluding the flexible set). The grapevine genome had not undergone recent genome duplication. The retention rate for PtHsf genes was higher than the average rate (~32 %) that resulted from the salicoid duplication event, suggesting that expansion of the PtHsf gene family was mainly due to the salicoid genome-wide duplication and rearrangement events. In our analysis, all duplicated HSF gene pairs were involved in large-scale duplication events, and there were no tandem duplicated gene pairs, which indicated that large-scale duplication (Messing and others 2004) rather than tandem duplication served as the most important driving force throughout the long period of poplar HSF gene evolution. After the shared polyploidy event and following divergence from the other angiosperms, the lineage leading to present-day poplar is known to have undergone a second whole-genome duplication approximately 13 Mya (Tuskan and others 2006). By calculating the duplication dates for the paralogous pairs, we concluded that all of the large-scale duplication events in the PtHsf gene family occurred between 8.16 and 31.45 Mya (Maher and others 2006); this duplication did not occur in other three angiosperm lineages. Among poplar HSF genes, eight genes were found as single copies on duplication blocks, such as PtHsf-01, -12, -13, -14, -15, -24, -26, and -30. These results suggested that large-scale duplication had occurred as a continuous process and dynamic changes might have occurred in a chromosomal segment that contained two ancestral HSF genes, leading to corresponding duplicated counterpart loss (Guo and others 2014). In addition, all grapevine genes were found as single copies on duplication blocks.

Synteny-based approaches to clarify comparative genome arrangements, originally based on genetic maps, grow in value as we gain finer-scale information derived from physical maps and whole-genome sequence (Schlueter and others 2008). Angiosperms present both challenges and opportunities for synteny-based genome comparisons. In the fifteen orthologous groups, genes from poplar, Arabidopsis, rice, and grapevine exhibited obvious levels of microsynteny, which indicated the HSF genes existed before the divergence of the three dicotyledon genomes (poplar, Arabidopsis, and grapevine) and one monocotyledon (rice) (Cannon and others 2003; Wang and others 2015a). The analysis of microsynteny could help unravel the actual evolutionary relationships between HSF regions among the four species by taking advantage of the surrounding genomic sequences. However, small insertions/deletions and inversions were observed between regions after the divergence in most HSF genes (Fig S1, a–c). Almost all Hsf-containing regions in dicotyledons (poplar, Arabidopsis, and grapevine) showed relatively close relationships with the orthologous duplicated regions in monocotyledons (rice). In most cases, a single region of Arabidopsis, rice, and grapevine was syntenic to two or more duplicated regions of poplar (Jaillon and others 2007). This indicated that palaeohexaploidy observed in Arabidopsis, rice, and grapevine was already present in its common ancestor with poplar.

Synteny and collinearity analyses of plant genomes have suggested that an ancient genome triplication (γ-triplication) event occurred in the common ancestor of Arabidopsis, rice, poplar, and grapevine resulting in a paleohexaploid (Tang and others 2008). After the γ-triplication event, Arabidopsis was affected by two recent paleopolyploidy events: β- and α- duplications. The latter was the most recent, occurring approximately 40 million years (MYs) ago (Blanc and others 2003). In poplar, there was a duplication event specific to its own salicoid lineage (ρ-duplication), which occurred between the β- and α- duplication events about 65 MYs ago (Tuskan and others 2006). Rice experienced one own duplication event (gramineous differentiation) besides the γ triplication (Guyot and Keller 2004). By contrast, there was only the common γ-triplication event and no subsequent polyploidies in grapevine. Based on the above discussion, the multiplicity ratio for an ancestral gene in the above species should be 4:2:2:1. In fact, we retrieved 22, 25, 30, and ten putative HSF genes from Arabidopsis, rice, poplar, and grapevine, while the current ratio of HSF genes among the four genes was 2.2:2.5:3:1. Compared with poplar, the other three species showed lower-than-expected current ratios of HSF genes. Therefore, HSF genes may have expanded faster in poplar than the other three species, or poplar may have lost fewer genes than did the other species.

The Ka/Ks ratios of the 18 duplication pairs showed that these gene pairs were subjected to purifying selection. Furthermore, the relatively higher Ka/Ks ratios for the PtHsf19/-17 and PtHsf19/-10 gene pairs suggested that they experienced rapid evolutionary diversification following duplication (Zhang and others 2006). Purifying selection probably played a key role in maintaining the long-term stability of biological structures of plant HSFs by removing deleterious mutations, thus ensuring that gene functions were maintained as long as they were needed.

Our understanding of HSF evolution served as a starting point for elucidating their complex regulatory roles. Expression data provided some insight into the functional divergence of duplicated HSF genes by capturing differences in specific tissues (Barrett and Edgar 2006a). We chose to use the GSE13990 data set because it could distinguish between different HSF gene loci and had nine different tissue-specific libraries for comparing expression profiles. The expression data for PtHsf-15 and PtHsf-18 were not included in the database. This indicated that these two HSF genes had low cell-specific expression levels, sensitivity, or limitations of the assay. Therefore, the expression data served as a good starting point for understanding the expression patterns within HSF families, but would need to be expanded on to have a true understanding of the temporal and spatial patterns of HSF genes. Most of the duplicated gene pairs that shared high levels of sequence homology showed similar expression patterns, i.e., PtHsf-04/-29, PtHsf-22/-09, and PtHsf-25/-06, suggesting that these duplicated genes did not diverge substantially after duplication, and had retained redundant functions in regulating tissue development. In contrast, only two duplicated gene pairs (PtHsf-17/-19 and PtHsf-23/-27) shared dissimilar expression patterns. Overall, we demonstrated that plant HSF gene families evolved through duplication events similar to those that drove the evolution of other protein-coding genes, and that the duplicated copies have taken on new expression patterns potentially resulting in neo- and subfunctionalization. The evolutionary relationships within a HSF family in conjunction with public expression data enabled us to explore the subsequent functional divergence of duplicated genes and could be used for further experimental analysis of their interactions with target mRNA and resulting regulatory effects in plant development.

Recent studies have reported that the expression of HSF genes was strongly induced by heat, cold, salt, and osmotic stress in Arabidopsis, rice, and lonicera (Mittal and others 2009; Nishizawa and others 2006; Peng and others 2015). In our study, poplar seedlings were subjected to three different stress treatments, and the results suggested that heat induced their expression more strongly than oxidation and cold (Lin and others 2011). We found three subclasses, HsfA2 (PtHsf-14), HsfA6 (PtHsf-17 and PtHsf-19), and HsfA7 (PtHsf-05 and PtHsf-10) that showed obviously higher expression following heat stress treatment (Nishizawa and others 2006; Schramm and others 2006). Our results indicated that the three subclasses played a key role in the heat shock response in poplar. Notably, only one HSF gene (PtHsf-15) assigned to Class B appeared strongly up-regulated by heat stress. The HSF proteins belonging to Class B lacked certain structural features of the Class A activator HSFs. This support the proposal that Class B HSFs act as transcriptional co-repressors cooperating with Class A HSFs. It was possible that the HSF genes with unaltered or down-regulated expression may lie downstream in the hierarchy of the events involved in the heat shock response, or they were repressed by other members of the gene family. It was noteworthy that many HSF genes of poplar induced by heat stress were also induced by oxidative stress, and PtHsf-15 was induced by all three stresses tested. These findings support the suggestion that HSFs serve as important sensors for H2O2 in plants and could be pivotal in linking the heat shock response with other stress-responsive signaling networks (Miller and others 2008). Most duplicated gene pairs shared similar expression pattern under three stress treatments in this study, suggesting that these sets of duplicated HSF genes may perform a basic, important role in poplar.

In summary, based on sequence similarity, phylogenetic distance, and gene collinearity, we tracked the differential retention and expansion of ancestral HSF genes among four model angiosperms. The results revealed that most of the HSF genes in each species are descendants of different ancestral genes. This study also highlighted the importance of learning the origin of a gene when analyzing its function across different species.