Abstract
Whole-genome duplication events played important roles in the evolution of angiosperms. Many modern angiosperms genomes have experienced one or more rounds of paleopolyploidy. Given that an ancient large-scale duplication will result in an excess of relatively old duplicated genes with similar ages, we analyzed the timing of duplication pairs in four model angiosperms (Populus trichocarpa, Arabidopsis thaliana, Oryza sativa, and Vitis vinifera). By searching for intraspecies microsynteny, we identified 18 paralogous gene pairs of duplicated segments resulting from a whole-genome duplication in poplar, Arabidopsis, and rice. Notably no paralogous gene pairs existed in grapevine. However, by comparing interspecies microsynteny, we confirmed that the majority of HSF-containing segments in the Arabidopsis, rice, and grapevine genomes show extensive conservation with duplicated regions in poplar. Microarray and quantitative reverse transcriptase PCR analysis showed that most of the poplar HSF genes are differentially expressed upon exposure to various stresses. In conclusion, this study comprehensively analyzed the evolution and duplication relationship of HSF across four model angiosperms, which demonstrated the rates of gene gain in different species.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Here, we examined the evolution of the heat shock transcription factor (HSF) family in angiosperms. HSFs are the terminal components of the signal transduction chain that activates genes in response to various inducers, including high temperatures, oxidants, heavy metals, and pathogens (Swindell and others 2007; Zhang and others 2013a). They are the central regulators of the expression of heat shock proteins (HSPs) and other heat shock-induced genes that confer thermotolerance to all eukaryotes. Heat shock proteins (HSPs) are often considered to function as a protective umbrella and act as molecular chaperones, which were first discovered in cells exposed to heat stress (Wang and others 2012). These proteins prevent protein unfolding and aggregation to help maintain cellular protein homeostasis, which determines critical cellular structures and functions to regulate stress response (Scharf and others 2012). The expression of HSPs is controlled and regulated by specific types of transcription factors called heat shock transcription factors HSFs, which normally exist as inactive proteins (Zhang and others 2015). Similar to many other transcription factors, the HSF family has a modular structure. Reports have shown that HSFs are composed of an N-terminal DNA-binding domain (DBD), characterized by a central helix-turn-helix motif that specifically binds to the heat shock elements (HSE) in the target promoters, and an adjacent bipartite oligomerization domain (HR-A/B) composed of hydrophobic heptad repeats (Giorno and others 2012; Song and others 2014). In addition, most HSFs also include clusters of basic amino acids essential for nuclear localization signal (NLS), nuclear export signal (NES), and a less conserved C-terminal activator domain (CTAD) rich in aromatic, hydrophobic, and acidic amino acids, the so-called AHA motifs (Huang and others 2015; Lin and others 2014b). In contrast to the small number of HSF genes found in Drosophila, Caenorhabditis elegans, yeasts, and animals, the HSF system is more complex in plants than in any other organism investigated thus far (Clos and others 1990; Sorger and Pelham 1988).
Genome duplication (polyploidy) is common in many lineages of the angiosperms (Adams and Wendel 2005). Many modern diploid plants harbor evidence of multiple rounds of past polyploidization events, often followed by massive gene loss of duplicated genes. Estimates for the incidence of polyploidy in angiosperms vary from 30 to 80 %, and 2 to 4 % of speciation events can be attributed to genome duplications (Otto 2007). Ancient duplication events have affected the number of genes in various gene families. Comparisons between monocotyledons and dicotyledonous plants have shown that genomes have undergone whole-genome duplication (Jaillon and others 2007). This older shared polyploidy event is estimated to have occurred 130–240 million years ago (Mya) (Blanc and Wolfe 2004). All modern angiosperms originate from a common hexapolyploid ancestor and among these it includes numerous closely related relationship species (Kellogg 2001). This closely related relationship has been demonstrated in comparative genomics studies, which shows that they are highly conserved in gene arrangement and order (Paterson and others 2000).
In recent years, studies on the HSF family had been thoroughly characterized in Arabidopsis, rice, poplar, Chinese cabbage, maize, and Malus (Giorno and others 2012; Guo and others 2008; Lin and others 2011; Mittal and others 2009; Song and others 2014), whose genomes have been sequenced. This provided a convenient opportunity for us to analyze the evolution and expansion of the HSF gene family in four model angiosperms, namely Populus trichocarpa, Arabidopsis thaliana, Oryza sativa, and Vitis vinifera, for which substantial information about genomes or transcriptomes was available. The aim of this study was to determine which genes were derived from genome duplication, subsequently giving rise to paralogs, which are genes descended from speciation events, giving rise to orthologs. We searched for microsynteny within and between the genomes of the four model angiosperms to investigate the evolutionary history of the HSF regions. We further analyzed the evolutionary rate and purifying selection of the genome segments of HSF genes, which provided a clue to the evolution of the HSF gene. The goal of this study was to ask whether this model of protein-coding gene family evolution applies to the HSF gene family, and, if so, whether there is an association between the evolution of HSF genes and changes in expression patterns that might indicate diversification of function.
Materials and Methods
Database Search and Sequence Retrieval
Sequences of Arabidopsis and rice HSFs were downloaded from the Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org/, release v10.0) and rice annotation database (http://rice.plantbiology.msu.edu/, release v5.0). We performed BLAST (Basic Local Alignment Search Tool algorithms) searches against the poplar and grapevine genome database using 22 of the known Arabidopsis full-length HSF protein sequences as queries, and then using the resulting poplar and grapevine sequences as secondary queries. Redundant sequences were then removed manually, and all candidate sequences were further analyzed to confirm the presence of conserved Hsf-type DBD domain and HR-A/B regions using Pfam (Bateman and others 2004) and Smart database (Letunic and others 2004). Information on sequence ID, ORF lengths, protein sequences, genomic sequences, CDS sequences, and location coordinates was obtained from Phytozome (http://www.phytozome.net) (Goodstein and others 2012).
Physical parameters of the predicted HSF proteins including length, molecular mass (kDa), and isoelectric point (pI) for each gene were calculated with the online ExPASy programs (http://web.expasy.org/protparam/) (Gasteiger and others 2003).
Phylogenetic Analysis, Gene Structure and Conserved Motif Analysis
Multiple sequence alignment of all predicted HSF protein sequences was performed with Clustal X2.0 software using default parameters (Horiuchi 2009). To understand the evolutionary relationships between poplar, rice, Arabidopsis, and grapevine, a phylogenetic tree was constructed using the neighbor-joining (NJ) and the maximum parsimony (MP) methods in MEGA 6.0 (Tamura and others 2013). Bootstrap analysis was conducted with 1000 replicates for each node. An unrooted NJ tree of 87 full-length HSF protein sequences from poplar, rice, Arabidopsis, and grapevine was constructed. The HSF genes were classified based on their phylogenetic relationships with the corresponding Arabidopsis and rice HSF genes.
The exon/intron organization was identified using the Gene Structure Display Server (GSDS; http://gsds.cbi.pku.edu.cn/) (Guo and others 2007) by alignment of the cDNA sequences with the corresponding genomic DNA sequences. The conserved structural motifs were analyzed using the MEME online program (http://meme.sdsc.edu/meme4_4_0/intro.html) (Durrant-Whyte and Bailey 2006). The parameters were as follows: number of repetition = any; maximum number of motifs = 30; and with an optimum motif length of between 6 and 200 residues. In addition, structural motif annotation was performed by Pfam (http://pfam.sanger.ac.uk/search) and SMART tools (Letunic and others 2004).
Intraspecies Microsynteny Analysis
To categorize the expansion of the HSF genes, tandem duplication of multiple gene family members was examined within either the same or neighboring intergenic regions (Zhang and others 2013b). First, gene pairs (A1 and A2) in each family were selected as the original anchor points representing those that belong to the same gene families. The genes were located within 100 kb from each other (Wang and others 2010). Eventually, we calculated the total number of protein-coding genes flanking the anchor point with a best non-self match (E value < 10−10) with a protein-coding gene flanking the other anchor point. We judged a large-scale duplication event as follows: where two HSF genes were detected having four or more of 15 up/downstream flanking genes, we considered the two regions to be conserved.
Microsynteny Analysis
Based on comparison with the independent regions, multiple HSF genes were subjected to microsynteny analysis across poplar, Arabidopsis, rice, and grapevine using Circos-0.54 (Chen and others 2015). Each HSF gene was used as an anchor point. 15 protein-encoding sequences up/downstream of the anchor points were used to identify duplicated genes in the two specific regions. A syntenic block was defined as a region in which three or more conserved homologs (BlastP, E value < 10−20) were located within a 100-kb region between the genomes (Sato and others 2008). When three or more such gene pairs with syntenic relationships were detected, the two regions were considered to have originated from a large-scale duplication event.
Duplication Event Dating and Evolution Analysis
Using each duplicated block regions among duplicated gene pairs, we calculated the mean Ks values and Ka/Ks ratios for individual homologs within the flanking conserved genes among homologous genes by DnaSP (version 5.10) (Librado and Rozas 2009). A sliding window analysis of the Ka/Ks ratios was performed with the following parameters: window size—150 bp and step size—9 bp. The Ks values were often used as proxy to date large-scale duplication events (Hu and others 2010; Lin and others 2014a). To estimate absolute dates for the large-scale gene duplication, we estimated the ages of the modes of the secondary peaks of gene duplication, assuming clock-like rates of synonymous substitution of 6.5 × 10−9 substitutions/synonymous site/year for rice, 1.5 × 10−8 substitutions/synonymous site/year for Arabidopsis, and 9.1 × 10−9 substitutions/synonymous site/year for poplar (Blanc and Wolfe 2004; Ma and others 2014). The duplication time (T) was calculated as T = Ks/2λ × 10−6 Mya (for example, λ = 6.5 × 10−9 for rice).
Microarray Data
Whole-genome microarray data for various tissues of poplar were acquired from the Gene Expression Omnibus (GEO) database (Barrett and Edgar 2006b) at the National Center for Biotechnology Information (NCBI) under accession number GSE13990. The Affymetrix microarray data originated from eight different tissue samples representing three biological replicates (Wilkins and others 2009). Probe sets corresponding to the putative PtHsf genes were identified using an online Probe Match tool available at the NetAffx Analysis Center (http://www.affymetrix.com/). The probe with the highest match value was employed when genes had more than one probe set. Based on Pearson coefficients with average linkage in the Genesis (version 1.75) program (Sturn and others 2002), the expression data were normalized and hierarchical clustered (Chai and others 2012). Followed by log transformation and average calculation, and then we saved them as tab-delimited files which were imported into Cluster (version 3.0) to generate heatmaps (Sturn and others 2002).
Plant Materials, Growth Conditions and Stress Treatments
For RNA isolation, four-week-old poplar seedlings were grown in the greenhouse (at continuous temperature 25 ± 2 °C, and 14/10 h—light/dark photoperiod). For different stress treatments, four-week-old uniform size seedlings were transferred to a temperature-controlled environment (heat stress, 42 ± 1 °C; cold stress, 4 ± 1 °C). For oxidative stress treatments seedlings leaves were sprayed with 10 mM H2O2 solution (Miller and others 2008). The seedling leaves were collected at 0 and 1 h after each treatment. After harvest, they were immediately frozen in liquid nitrogen and stored at −80 °C prior to further analysis. Three technical replicates were taken in each biological replicate.
Quantitative Real-Time PCR Analysis
Trizol reagent (Invitrogen, USA) was used to extract total RNA from about 0.1 g of leaves. Residual genomic DNA was removed using DNase I. Total RNA was analyzed on a 2 % agarose gel and quantified by electrophoresis. cDNAs were synthesized using the PrimerScript RT Master mix (RR036A, TaKaRa) and 1 μg of total RNA following the manufacturer’s instructions. Alternatively, gene-specific primers were designed for 30 PtHsf genes with Primer 5.0 and then checked with NCBI (Table S1). Reactions were performed in a total volume of 20 μl containing 10 μl of 2X SYBR Premix Ex TaqTM (TaKaRa, Japan), 0.8 μl of each gene-specific primer, 0.4 μl of 50X ROX Reference Dye, 6.0 μl of ddH2O, and 2.0 μl diluted cDNA template. The PCR conditions were as follows: an initial denaturation of 95 °C for 30 s, 40 thermal cycles of denaturation at 95 °C for 5 s, and annealing at 55–60 °C for 34 s. We used three biological replicates (poplars from different lots) and three technical replicates (replicate poplar from the same lot) for each sample (Guidone and others 2016). The relative expression level for each gene was described as \( 2^{{ - \Delta \Delta {\text{C}}_{\text{T}} }} \) level in comparison to that of the untreated control plant which was normalized to 1 (Livak and Schmittgen 2001). Statistical analyses were performed by GraphPad Prism 5.0 (Bryfczynski and Pargas 2009).
Results
HSF Genes in Four Model Angiosperms
Most angiosperms fall into two large clades, monocotyledoneae and dicotyledoneae. The four model angiosperms, monocotyledons (O.sativa) and dicotyledons (P. trichocarpa, A. thaliana, and V. vinifera), originate from a common paleohexapolyploid ancestor. Based on nomenclature proposed in a previous study, we obtained 22, 25, and 30 putative HSF genes from the Arabidopsis, rice, and poplar, respectively. Due to poplar genome database updates, some changes were found with the poplar sequences. These are shown in Table S1. To identify 10 HSFs in grapevine, we performed a search against the genome database with BlastP using AtHsf protein sequences as queries (Table 1, Table S2). The details of information about amino acids (length), chromosomal location, molecular weight (Mw), and isoelectric point (PI) of each HSF genes are listed in Table S2.
From the results, the number of HSF genes in poplar, Arabidopsis, and rice was more than grapevine. Therefore, grapevine may have lost HSF genes faster than other plants, or it may have expanded slower in its evolutionary process.
Phylogenetic and Structural Analysis of the HSF Genes
To investigate the evolutionary organization among the HSF gene family, an unrooted phylogenetic tree of 30 poplar HSFs, 22 Arabidopsis HSFs, 25 rice HSFs, and 10 grapevine HSFs was performed with 1000 bootstrap replicates (Fig. 1a). The phylogenetic tree was constructed using MEGA 6.0 by employing the neighbor-joining (NJ) and maximum parsimony (MP) methods, respectively. The tree topologies produced by the two algorithms were largely comparable with only minor modifications at interior branches (data not shown). Therefore, only the NJ phylogenetic tree was subjected to further analysis in our study (Fig. 1a). Phylogenetic analysis showed that the HSF genes from the four model angiosperms could be grouped into three classes that corresponded to the classes A, B, and C. It also appeared that the numbers of poplar and Arabidopsis class A proteins were higher than other species. These represented HSF genes which had specifically expanded within the poplar and Arabidopsis genomes. For instance, there were seventeen and sixteen HSF Class A genes in poplar and Arabidopsis, respectively. Based on the nomenclature in previous studies of Arabidopsis and rice, Class A and Class B were subdivided into eight (A1–8) and five (B1–5) clades according to their bootstrap values and phylogenetic relationships (Fig. 1a). In our analysis, AtHsf-19 (HSFA6a) did not cluster in clade A6 but grouped in clade A2. In addition, poplar, Arabidopsis, and grapevine only had a single HSF gene in Class C, whereas rice had four HSF genes in Class C. Thus, the Class C HSF genes appeared to have undergone expansion in the monocotyledons.
We then compared the exon/intron organization in the coding sequences of HSF genes. Most closely related HSF members in the same subfamily shared very similar gene structure in terms of each intron numbers or exon lengths. For example, all of the poplar, Arabidopsis, and rice HSF genes had two exons and one intron, except for PtHsf-14, PtHsf-24, OsHsf-08, OsHsf-09, OsHsf-24, AtHsf-01, and AtHsf-14 added one exon and intron and PtHsf-14 lost one exon and intron during the long evolutionary period (Fig. 1b). However, in grapevine, the number of introns varied from 1 to 11, whereas the major members had one to three introns.
We further examined the exon/intron organization of 20 paralogous pairs in these four species to investigate traceable intron gain or loss within these. Among these, only two paralogous pairs (OsHsf-08/-21 and PtHsf-16/-20) showed any difference. These differences might be derived from the exon/intron loss or gain event.
We used MEME motif detection software to reveal the diversification of HSF genes (Fig. 2). The details of the 30 putative motifs are shown in Table S3. Each of the putative motifs was annotated by searching Pfam and SMART (Finn and others 2013; Letunic and others 2012). The most common motif was motif 1. As expected, most of the closely related members had common motif composition, suggesting functional similarities among HSF proteins within the same subfamily.
Ongoing Individual Gene Duplication
To examine genetic divergence within each HSF family and the expansion patterns, we further studied gene duplication events in these four model angiosperms (Table 2). The angiosperm genomes had undergone at least two rounds of whole-genome duplication events during evolutionary history (Tuskan and others 2006). Gene duplication, including segmental duplication and tandem duplication, is considered to be one of the primary driving forces in the evolution of genomes (Moore and Purugganan 2005). First of all, we analyzed the adjacent genes to determine whether tandem duplication had taken place. A pair of genes separated by five or fewer genes within a 100-kb region on a chromosome might have resulted from tandem duplication. According to this criterion, no pair was found to be generated by tandem duplication. In addition, to investigate larger-scale duplication possibilities, we searched for gene similarity in the HSFs flanking regions. If four or more of the 15 up- and downstream genes flanking two HSF genes achieved a best non-self match using BLASTP (E value < 10−10), we considered these gene pairs to be conserved and defined these two regions as derived from a large-scale duplication event (Li and others 2014). Two or three flanking genes were considered flexible sets. Eighteen paralogous gene pairs (31 genes) were localized to segmentally duplicated regions (Table 2). In poplar, 22 out of 30 HSF genes were arranged in large-scale duplication. Each gene pair had conservation between their flanking genes. However, only a few HSF-containing segments had matched duplicates in Arabidopsis and rice, that is, OsHsf-20/-22, OsHsf-10/-19, OsHsf-12/-18, OsHsf-08/-25, and AtHsf-15/-20. The observation that multiple copies of HSF were retained in poplar is reasonable from an evolutionary perspective because HSFs confer various abiotic and biotic resistance traits to plants (Ogawa and others 2007; Sakuma and others 2006). Paralogs produced from the recent duplication genes and the ancient duplication in poplar. In the populus HSF duplicated network, when two duplicated genes from recent duplication could not be found simultaneously, we reasoned that a possible ancient gene loss event occurred. Previous reports (Guo and others 2014; Tang and others 2008) indicated that the grapevine genome had undergone a γ event, in theory, an ancestor gene would be copied into three genes. However, in the VvHsf gene family, we found no paralogs between the 10 VvHsf genes (Table 2). Thus, we suggested that many copies of HSF genes in grapevine may have been lost during the long-term evolutionary process.
Dating Duplication Events
Under the assumption that synonymous silent substitutions per site (Ks) occur with a constant rate over time, we could use the conserved flanking protein-coding genes to estimate the dates of the larger-scale duplication events (Blanc and Wolfe 2004). For this analysis, we used duplicated blocks to date duplication events in our strict set only (excluding flexible sets). Each pair of proteins in the duplicated block was aligned at the amino acid level, and then codons from gapless aligned regions were used to calculate Ks values. We deleted any Ks values above 2.0 because of the risk of saturation (Maher and others 2006). We showed the mean Ks values for duplication and the estimated date in Table 2. For instance, based on the substitution rate of 9.1 × 10−9 substitutions per site per year previously proposed for poplar (Ma and others 2014), the duplication events for the 13 paralogs were estimated to have occurred between 8.16 Mya (million years ago) and 31.45 Mya. Additionally, PtHsf-19, PtHsf-10, and PtHsf-05 were all related via the ancient genome duplication, but the relationships between PtHsf-17 and PtHsf-10 or PtHsf-05 were uncertain. Therefore, PtHsf-17 might be a product under a segmental duplication of PtHsf-19. These results supported that two whole-genome duplications played a key role in the expansion of the poplar HSF family.
Conserved Microsynteny Analysis
Using a stepwise gene-by-gene reciprocal comparison of the regions hosting the HSF genes, we observed strongly conserved microsynteny among Arabidopsis, rice, grapevine, and poplar (Fig. 3, Table S4). To provide a basic framework for identified the HSF, 87 HSF genes including 30 poplar HSFs, 22 Arabidopsis HSFs, 25 rice HSFs, and 10 grapevine HSFs, were divided into fifteen distinct groups (Fig S1, a–c), based on phylogenetic analysis. Each group contained at least one gene from poplar, Arabidopsis, rice, and grapevine, indicating that these different species maybe derived from a common ancestor. Therefore, we referred to these 15 distinct groups as orthologous groups.
In each orthologous group, high levels of microsynteny were maintained between the members of three plants (Arabidopsis, grapevine, and rice) and networks of duplicated regions in poplar, each anchored by the HSF gene. Segments of different species were thought to have shared the ancient whole-genome duplication that occurred outside of the papilionoid lineage within an orthologous group. Two representative synteny diagrams of these groups were shown in groups A4 and B2 (Fig. 4). In group A4, PtHsf-09/OsHsf-04 and AtHsf-20/AtHsf-15 exhibited remarkable opposite-direction microsynteny, whereas PtHsf-09/VvHsf-08 and PtHsf-04/AtHsf-20 were aligned with flanking gene pairs in the same order but discordant transcriptional orientation. A high level of microsynteny existed in group B2, with most pairs in reverse order, especially PtHsf-23/AtHsf-22, OsHsf-24/OsHsf-14, and AtHsf-11/VvHsf-05 were identified as having opposite-direction microsynteny. In particular, the duplicated pair PtHsf-23/PtHsf-27 had highly detectable linkage with each other.
To estimate the extent of conserved gene content and order, synteny quality was assessed. The average synteny quality of the HSF genes across the four plants was 15.31 % (Table S4). Due to the large number of syntenic genes shared between poplar and grapevine, the synteny quality between these genomes was 23.59 %, which was higher than that observed for the At/Vv synteny blocks (16.88 %). The lowest synteny quality (9.99 %) was found between rice and Arabidopsis. Details of these comparative analyses are shown in Table S4. These results supported the orthology of the segment groups used in this study. The extremely low linear relationship among four model angiosperms also suggested that most of HSF genes in each species are descendants of different ancestral genes (Table S4).
Strong Purifying Selection for HSF Genes in Angiosperms
To better understand the evolutionary history of the HSF gene family in angiosperms, we measured the Ka/Ks ratios of HSF paralogs in the network of duplicated regions of poplar, Arabidopsis, and rice. Grapevine was excluded from this analysis due to the lack of HSF paralog pairs. Interestingly, the resulting data showed that all paralog pairs have Ka/Ks << 1 (Table S5, Fig. 5), indicating that the HSF family mainly evolved under purifying selection. To obtain further insight into the selection pressure, we performed a sliding window analysis of Ka/Ks between each pair of HSF paralogs. As expected from the basic Ka/Ks analysis, sliding window analysis clearly showed that numerous sites/regions were under moderate to strong negative selection (Fig. 6). However, there were still some conserved regions of HSFs subjected to positive selection, with Ka/Ks ratios above 1 (for example, PtHsf-17/-19 and OsHsf-10/-19).
The results showed that HSF family in angiosperms were under purifying selection, and these genes were under strong evolutionary constraints so as to maintain its stable function. Additionally, a few conserved coding regions in the HSF gene family underwent positive selection, which indicated some new function in these HSF genes.
Expression Patterns of HSF Genes in Poplar
To gain insight into the possible functions of HSF genes, we re-analyzed the information on expression of poplar HSF genes from microarray data (Fig. 7). We investigated probe sets for 28 of the 30 in the microarray data downloaded from NCBI (accession number GSE13990). The expression data for two genes (PtHsf-15 and PtHsf-18) were not detected in the platforms. Duplicated genes often have different evolutionary fates, that is, nonfunctionalization, neofunctionalization, or subfunctionalization, which maybe indicated with divergence in their expression patterns (Wang and others 2015b). From these results, we determined that there were nine paralogous gene pairs in the poplar HSF gene family. As shown in Fig. 7, most pairs of paralogs in poplar shared similar expression patterns and thus showed functional redundancy. Seven paralogous pairs (PtHsf-22/-09, PtHsf-04/-29, PtHsf-07/-02, PtHsf-20/-16, PtHsf-10/-05, PtHsf-08/-21, and PtHsf-06/-25) derived from segmental duplication shared almost identical expression patterns with respect to different tissues and various stresses. In contrast, two paralogous pairs (PtHsf-19/-17 and PtHsf-23/-27) shared dissimilar expression patterns. For example, PtHsf-19 showed the highest transcript abundances in female catkin, whereas its duplicated counterpart PtHsf-17 was preferentially expressed in xylem. The specific and varied expression profiles of HSF genes in poplar tissues suggested that they played diverse and novel roles, which was in agreement with the results of phylogenetic and protein motif analysis.
Previous studies have shown that nine PtHsf genes respond to abiotic stresses including heat, NaCl, and CdCl2 (Zhang and others 2013a). To further investigate the function of the HSF gene family and their duplicated genes under different abiotic stress conditions, we performed qRT-PCR analysis of 30 HSF genes in the response to abiotic stress using RNA isolated from the leaves of 4-week-old poplar subjected to heat, cold, and H2O2 stress treatments (Jian and others 2008). The qRT-PCR results showed that a total of 30 genes were clearly up- or down-regulated compared to the controls (>two-fold or <0.5-fold) in at least one of the stress conditions examined (Fig. 8). By comparing the expression patterns of 20 segmental duplicated genes, we found that nine duplicated gene pairs shared similar expression pattern under three stress treatments. Among these genes, most were responsive to more than one stress treatment. For example, PtHsf-15 was significantly up-regulated by all three stresses. Under heat treatment, all paralogous gene pairs showed similar up- or down-regulation, except for PtHsf-07/-02. PtHsf-02 was dramatically up-regulated, whereas PtHsf-07 was conspicuously down-regulated. Under cold treatment, only two duplication gene pairs (PtHsf-03/-18 and PtHsf-11/-28) had different expressions; PtHsf-03 was up-regulated and PtHsf-18 was down-regulated. Under H2O2 treatment, the expression profiles of nine duplicated gene pairs were similar, whereas PtHsf-08/-21 was different. PtHsf-08 was slightly up-regulated, by contrast, PtHsf-21 was obviously down-regulated. It is noteworthy that seven genes (PtHsf-02, PtHsf-05, PtHsf-10, PtHsf-14, PtHsf-15, PtHsf-17, and PtHsf-19) strongly responded to heat stress in our experiments, indicating that they could have important roles in the heat shock regulatory network.
Discussion
In this study, we performed a genome-wide comparison of HSF members from eudicots (P. trichocarpa, A. thaliana, and V. vinifera) and monocots (O.sativa) to reconstruct the gene gain/loss in angiosperms. The phylogenetic tree was quite informative for inferring the number of HSFs in the most recent common ancestor of the four species analyzed in this study (Nam and others 2004). Based on phylogenetic tree analysis, the HSF genes from the four species could be delineated into 14 well-supported ancient gene lineages, although the representatives of one or two species are missing from some clades. Consequently, this analysis revealed that there were at least 14 HSF genes in the most recent common ancestor among these four species (Nam and others 2004). However, there was a large amount of variability in each clade indicating lineage-specific gene gain and loss (Zhou and others 2000). Except for the shared clade representing Class C HSFs, all of the clades were confirmed by the following interspecies microsynteny analysis (Wang and others 2015a).
Phylogenetic analysis of the HSF gene family revealed that the exon/intron structures and motif compositions of these genes are relatively conserved (Chen and others 2014; Feng and others 2014). Most closely related genes within the same subfamily shared similar gene structures. However, some motifs with unknown functions were present in nearly every subgroup, and these motifs might play important roles in the HSF family.
For large-scale duplication, we observed paralogous gene pairs in each species and used the level of synonymous nucleotide substitution to estimate the relative ages of gene duplication (Li and others 2014). We identified 22, eight, and two duplicated genes in poplar, rice, and Arabidopsis, which accounted for 73.3, 32, and 9 % produced from the duplicated genes and total genes, respectively (excluding the flexible set). The grapevine genome had not undergone recent genome duplication. The retention rate for PtHsf genes was higher than the average rate (~32 %) that resulted from the salicoid duplication event, suggesting that expansion of the PtHsf gene family was mainly due to the salicoid genome-wide duplication and rearrangement events. In our analysis, all duplicated HSF gene pairs were involved in large-scale duplication events, and there were no tandem duplicated gene pairs, which indicated that large-scale duplication (Messing and others 2004) rather than tandem duplication served as the most important driving force throughout the long period of poplar HSF gene evolution. After the shared polyploidy event and following divergence from the other angiosperms, the lineage leading to present-day poplar is known to have undergone a second whole-genome duplication approximately 13 Mya (Tuskan and others 2006). By calculating the duplication dates for the paralogous pairs, we concluded that all of the large-scale duplication events in the PtHsf gene family occurred between 8.16 and 31.45 Mya (Maher and others 2006); this duplication did not occur in other three angiosperm lineages. Among poplar HSF genes, eight genes were found as single copies on duplication blocks, such as PtHsf-01, -12, -13, -14, -15, -24, -26, and -30. These results suggested that large-scale duplication had occurred as a continuous process and dynamic changes might have occurred in a chromosomal segment that contained two ancestral HSF genes, leading to corresponding duplicated counterpart loss (Guo and others 2014). In addition, all grapevine genes were found as single copies on duplication blocks.
Synteny-based approaches to clarify comparative genome arrangements, originally based on genetic maps, grow in value as we gain finer-scale information derived from physical maps and whole-genome sequence (Schlueter and others 2008). Angiosperms present both challenges and opportunities for synteny-based genome comparisons. In the fifteen orthologous groups, genes from poplar, Arabidopsis, rice, and grapevine exhibited obvious levels of microsynteny, which indicated the HSF genes existed before the divergence of the three dicotyledon genomes (poplar, Arabidopsis, and grapevine) and one monocotyledon (rice) (Cannon and others 2003; Wang and others 2015a). The analysis of microsynteny could help unravel the actual evolutionary relationships between HSF regions among the four species by taking advantage of the surrounding genomic sequences. However, small insertions/deletions and inversions were observed between regions after the divergence in most HSF genes (Fig S1, a–c). Almost all Hsf-containing regions in dicotyledons (poplar, Arabidopsis, and grapevine) showed relatively close relationships with the orthologous duplicated regions in monocotyledons (rice). In most cases, a single region of Arabidopsis, rice, and grapevine was syntenic to two or more duplicated regions of poplar (Jaillon and others 2007). This indicated that palaeohexaploidy observed in Arabidopsis, rice, and grapevine was already present in its common ancestor with poplar.
Synteny and collinearity analyses of plant genomes have suggested that an ancient genome triplication (γ-triplication) event occurred in the common ancestor of Arabidopsis, rice, poplar, and grapevine resulting in a paleohexaploid (Tang and others 2008). After the γ-triplication event, Arabidopsis was affected by two recent paleopolyploidy events: β- and α- duplications. The latter was the most recent, occurring approximately 40 million years (MYs) ago (Blanc and others 2003). In poplar, there was a duplication event specific to its own salicoid lineage (ρ-duplication), which occurred between the β- and α- duplication events about 65 MYs ago (Tuskan and others 2006). Rice experienced one own duplication event (gramineous differentiation) besides the γ triplication (Guyot and Keller 2004). By contrast, there was only the common γ-triplication event and no subsequent polyploidies in grapevine. Based on the above discussion, the multiplicity ratio for an ancestral gene in the above species should be 4:2:2:1. In fact, we retrieved 22, 25, 30, and ten putative HSF genes from Arabidopsis, rice, poplar, and grapevine, while the current ratio of HSF genes among the four genes was 2.2:2.5:3:1. Compared with poplar, the other three species showed lower-than-expected current ratios of HSF genes. Therefore, HSF genes may have expanded faster in poplar than the other three species, or poplar may have lost fewer genes than did the other species.
The Ka/Ks ratios of the 18 duplication pairs showed that these gene pairs were subjected to purifying selection. Furthermore, the relatively higher Ka/Ks ratios for the PtHsf19/-17 and PtHsf19/-10 gene pairs suggested that they experienced rapid evolutionary diversification following duplication (Zhang and others 2006). Purifying selection probably played a key role in maintaining the long-term stability of biological structures of plant HSFs by removing deleterious mutations, thus ensuring that gene functions were maintained as long as they were needed.
Our understanding of HSF evolution served as a starting point for elucidating their complex regulatory roles. Expression data provided some insight into the functional divergence of duplicated HSF genes by capturing differences in specific tissues (Barrett and Edgar 2006a). We chose to use the GSE13990 data set because it could distinguish between different HSF gene loci and had nine different tissue-specific libraries for comparing expression profiles. The expression data for PtHsf-15 and PtHsf-18 were not included in the database. This indicated that these two HSF genes had low cell-specific expression levels, sensitivity, or limitations of the assay. Therefore, the expression data served as a good starting point for understanding the expression patterns within HSF families, but would need to be expanded on to have a true understanding of the temporal and spatial patterns of HSF genes. Most of the duplicated gene pairs that shared high levels of sequence homology showed similar expression patterns, i.e., PtHsf-04/-29, PtHsf-22/-09, and PtHsf-25/-06, suggesting that these duplicated genes did not diverge substantially after duplication, and had retained redundant functions in regulating tissue development. In contrast, only two duplicated gene pairs (PtHsf-17/-19 and PtHsf-23/-27) shared dissimilar expression patterns. Overall, we demonstrated that plant HSF gene families evolved through duplication events similar to those that drove the evolution of other protein-coding genes, and that the duplicated copies have taken on new expression patterns potentially resulting in neo- and subfunctionalization. The evolutionary relationships within a HSF family in conjunction with public expression data enabled us to explore the subsequent functional divergence of duplicated genes and could be used for further experimental analysis of their interactions with target mRNA and resulting regulatory effects in plant development.
Recent studies have reported that the expression of HSF genes was strongly induced by heat, cold, salt, and osmotic stress in Arabidopsis, rice, and lonicera (Mittal and others 2009; Nishizawa and others 2006; Peng and others 2015). In our study, poplar seedlings were subjected to three different stress treatments, and the results suggested that heat induced their expression more strongly than oxidation and cold (Lin and others 2011). We found three subclasses, HsfA2 (PtHsf-14), HsfA6 (PtHsf-17 and PtHsf-19), and HsfA7 (PtHsf-05 and PtHsf-10) that showed obviously higher expression following heat stress treatment (Nishizawa and others 2006; Schramm and others 2006). Our results indicated that the three subclasses played a key role in the heat shock response in poplar. Notably, only one HSF gene (PtHsf-15) assigned to Class B appeared strongly up-regulated by heat stress. The HSF proteins belonging to Class B lacked certain structural features of the Class A activator HSFs. This support the proposal that Class B HSFs act as transcriptional co-repressors cooperating with Class A HSFs. It was possible that the HSF genes with unaltered or down-regulated expression may lie downstream in the hierarchy of the events involved in the heat shock response, or they were repressed by other members of the gene family. It was noteworthy that many HSF genes of poplar induced by heat stress were also induced by oxidative stress, and PtHsf-15 was induced by all three stresses tested. These findings support the suggestion that HSFs serve as important sensors for H2O2 in plants and could be pivotal in linking the heat shock response with other stress-responsive signaling networks (Miller and others 2008). Most duplicated gene pairs shared similar expression pattern under three stress treatments in this study, suggesting that these sets of duplicated HSF genes may perform a basic, important role in poplar.
In summary, based on sequence similarity, phylogenetic distance, and gene collinearity, we tracked the differential retention and expansion of ancestral HSF genes among four model angiosperms. The results revealed that most of the HSF genes in each species are descendants of different ancestral genes. This study also highlighted the importance of learning the origin of a gene when analyzing its function across different species.
References
Adams KL, Wendel JF (2005) Polyploidy and genome evolution in plants. Curr Opin Plant Biol 8:135–141. doi:10.1016/j.pbi.2005.01.001
Barrett T, Edgar R (2006) Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411:352–369
Bateman A et al (2004) The Pfam protein families database. Nucleic acids Res 32:D138–D141
Blanc G, Wolfe KH (2004) Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:1667–1678
Blanc G, Hokamp K, Wolfe KH (2003) A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res 13:137–144
Bryfczynski SP, Pargas RP (2009) GraphPad: a graph creation tool for CS2/CS7. In: ACM SIGCSE Bulletin. vol 3. ACM, pp 389–389
Cannon SB et al (2003) Evolution and microsynteny of the apyrase gene family in three legume genomes. Mol Genet Genomics 270:347–361
Chai G et al (2012) Comprehensive analysis of CCCH zinc finger family in poplar (Populus trichocarpa). BMC Genomics 13:253
Chen X, Chen Z, Zhao H, Zhao Y, Cheng B, Xiang Y (2014) Genome-wide analysis of soybean HD-Zip gene family and expression profiling under salinity and drought treatments. PloS one. doi:10.1371/journal.pone.0087156
Chen W-J et al (2015) Significant microsynteny with new evolutionary highlights is detected through comparative genomic sequence analysis of maize CCCH IX gene subfamily. Int J Genomics. doi:10.1155/2015/824287
Clos J, Westwood JT, Becker PB, Wilson S, Lambert K, Wu C (1990) Molecular cloning and expression of a hexameric Drosophila heat shock factor subject to negative regulation. Cell 63:1085–1097
Durrant-Whyte H, Bailey T (2006) Simultaneous localization and mapping: part I. IEEE Robot Autom Mag 13:99–110
Feng L, Chen Z, Ma H, Chen X, Li Y, Wang Y, Xiang Y (2014) The IQD gene family in soybean: structure, phylogeny, evolution and expression. PloS one 9(10):e110896
Finn RD et al (2013) Pfam: the protein families database. Nucleic Acids Res. doi:10.1093/nar/gkv1344
Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31:3784–3788
Giorno F, Guerriero G, Baric S, Mariani C (2012) Heat shock transcriptional factors in Malus domestica: identification, classification and expression analysis. BMC Genom 13:639
Goodstein DM et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186
Guidone A, Zotta T, Matera A, Ricciardi A, De Filippis F, Ercolini D, Parente E (2016) The microbiota of high-moisture mozzarella cheese produced with different acidification methods. Int J Food Microbiol 216:9–17
Guo A, Zhu Q, Chen X, Luo J (2007) GSDS: a gene structure display server. Yi chuan = Hereditas/Zhongguo yi chuan xue hui bian ji 29:1023–1026
Guo J et al (2008) Genome-wide analysis of heat shock transcription factor families in rice and Arabidopsis. J Genet Genom 35:105–118
Guo L, Chen Y, Ye N, Dai X, Yang W, Yin T (2014) Differential retention and expansion of the ancestral genes associated with the paleopolyploidies in modern rosid plants, as revealed by analysis of the extensins super-gene family. BMC Genom 15:612
Guyot R, Keller B (2004) Ancestral genome duplication in rice. Genome 47:610–614
Horiuchi H (2009) Functional diversity of chitin synthases of Aspergillus nidulans in hyphal growth, conidiophore development and septum formation. Med Mycol 47:S47–S52
Hu R, Qi G, Kong Y, Kong D, Gao Q, Zhou G (2010) Comprehensive analysis of NAC domain transcription factor gene family in Populus trichocarpa. BMC Plant Biol 10:145
Huang Y et al (2015) Heat shock factors in carrot: genome-wide identification, classification, and expression profiles response to abiotic stress. Mol Biol Rep 42:893–905
Jaillon O et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467
Jian B, Liu B, Bi Y, Hou W, Wu C, Han T (2008) Validation of internal control for gene expression study in soybean by quantitative real-time PCR. BMC Mol Biol 9:59
Kellogg EA (2001) Evolutionary history of the grasses. Plant Physiol 125:1198–1205
Letunic I et al (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res 32:D142–D144
Letunic I, Doerks T, Bork P (2012) SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 40:D302–D305
Li Z et al (2014) Molecular evolution of the HD-ZIP I gene family in legume genomes. Gene 533:218–228
Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451–1452
Lin Y-X, Jiang H-Y, Chu Z-X, Tang X-L, Zhu S-W, Cheng B-J (2011) Genome-wide identification, classification and analysis of heat shock transcription factor family in maize. BMC Genom 12:76
Lin Y, Cheng Y, Jin J, Jin X, Jiang H, Yan H, Cheng B (2014a) Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes. PloS one 9:e102825
Lin Y, Cheng Y, Jin J, Jin X, Jiang H, Yan H, Cheng B (2014b) Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes. PloS one 9(7):e102825
Livak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods 25:402–408
Ma H, Feng L, Chen Z, Chen X, Zhao H, Xiang Y (2014) Genome-wide identification and expression analysis of the IQD gene family in Populus trichocarpa. Plant Sci 229:96–110
Maher C, Stein L, Ware D (2006) Evolution of Arabidopsis microRNA families through duplication events. Genome Res 16:510–519
Messing J et al (2004) Sequence composition and genome organization of maize. Proc the Natl Acad Sci USA 101:14349–14354
Miller G, Shulaev V, Mittler R (2008) Reactive oxygen signaling and abiotic stress. Physiol Plant 133:481–489
Mittal D, Chakrabarti S, Sarkar A, Singh A, Grover A (2009) Heat shock factor gene family in rice: genomic organization and transcript expression profiling in response to high temperature, low temperature and oxidative stresses. Plant Physiol Biochem 47:785–795
Moore RC, Purugganan MD (2005) The evolutionary dynamics of plant duplicate genes. Curr Opin Plant Biol 8:122–128
Nam J, Kim J, Lee S, An G, Ma H, Nei M (2004) Type I MADS-box genes have experienced faster birth-and-death evolution than type II MADS-box genes in angiosperms. Proc Natl Acad Sci USA 101:1910–1915
Nishizawa A, Yabuta Y, Yoshida E, Maruta T, Yoshimura K, Shigeoka S (2006) Arabidopsis heat shock transcription factor A2 as a key regulator in response to several types of environmental stress. Plant J 48:535–547
Ogawa D, Yamaguchi K, Nishiuchi T (2007) High-level overexpression of the Arabidopsis HsfA2 gene confers not only increased themotolerance but also salt/osmotic stress tolerance and enhanced callus growth. J Exp Bot 58:3373–3383
Otto SP (2007) The evolutionary consequences of polyploidy Cell 131:452–462
Paterson AH et al (2000) Comparative genomics of plant chromosomes. Plant Cell 12:1523–1539
Peng X, Wu Q, Teng L, Tang F, Pi Z, Shen S (2015) Transcriptional regulation of the paper mulberry under cold stress as revealed by a comprehensive analysis of transcription factors. BMC Plant Biol 15:108
Sakuma Y, Maruyama K, Qin F, Osakabe Y, Shinozaki K, Yamaguchi-Shinozaki K (2006) Dual function of an Arabidopsis transcription factor DREB2A in water-stress-responsive and heat-stress-responsive gene expression. Proc Natl Acad Sci 103:18822–18827
Sato S et al (2008) Genome structure of the legume, Lotus japonicus. DNA Res 15:227–239
Scharf K-D, Berberich T, Ebersberger I, Nover L (2012) The plant heat stress transcription factor (Hsf) family: structure, function and evolution. Biochim et Biophys Acta (BBA) 1819:104–119
Schlueter JA, Scheffler BE, Jackson S, Shoemaker RC (2008) Fractionation of synteny in a genomic region containing tandemly duplicated genes across Glycine max, Medicago truncatula, and Arabidopsis thaliana. J Hered 99:390–395
Schramm F, Ganguli A, Kiehlmann E, Englich G, Walch D, von Koskull-Döring P (2006) The heat stress transcription factor HsfA2 serves as a regulatory amplifier of a subset of genes in the heat stress response in Arabidopsis. Plant Mol Biol 60:759–772
Song X et al (2014) Genome-wide identification, classification and expression analysis of the heat shock transcription factor family in Chinese cabbage. Mol Genet Genom 289:541–551
Sorger PK, Pelham HR (1988) Yeast heat shock factor is an essential DNA-binding protein that exhibits temperature-dependent phosphorylation. Cell 54:855–864
Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18:207–208
Swindell WR, Huebner M, Weber AP (2007) Transcriptional profiling of Arabidopsis heat shock proteins and transcription factors reveals extensive overlap between heat and non-heat stress response pathways. BMC Genom 8:125
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729
Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH (2008) Synteny and collinearity in plant genomes. Science 320:486–488
Tuskan GA et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604
Wang L et al (2010) Expression profiling and integrative analysis of the CESA/CSL superfamily in rice. BMC Plant Biol 10:282
Wang F, Dong Q, Jiang H, Zhu S, Chen B, Xiang Y (2012) Genome-wide analysis of the heat shock transcription factors in Populus trichocarpa and Medicago truncatula. Mol Biol Rep 39:1877–1886
Wang Y, Feng L, Zhu Y, Li Y, Yan H, Xiang Y (2015a) Comparative genomic analysis of the WRKY III gene family in populus, grape, arabidopsis and rice. Biol Direct 10:1–27
Wang Y, Wang Q, Zhao Y, Han G, Zhu S (2015b) Systematic analysis of maize class III peroxidase gene family reveals a conserved subfamily involved in abiotic stress response. Gene 566:95–108
Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM (2009) Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiol 149:981–993
Zhang Z, Li J, Yu J (2006) Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC Evol Biol 6:44
Zhang H, Yang J, Chen Y, Mao X, Wang Z, Li C (2013a) Identification and expression analysis of the heat shock transcription factor (HSF) gene family in Populus trichocarpa. Plant Omics 6:415
Zhang J, Li J, Liu B, Zhang L, Chen J, Lu M (2013b) Genome-wide analysis of the Populus Hsp90 gene family reveals differential expression patterns, localization, and heat stress responses. BMC Genom 14:532
Zhang J et al (2015) Hsf and Hsp gene families in Populus: genome-wide identification, organization and correlated expression during development and in stress responses. BMC Genom 16:1–19
Zhou Q, Wang S, Anderson DJ (2000) Identification of a novel family of oligodendrocyte lineage-specific basic helix–loop–helix transcription factors. Neuron 25:331–343
Acknowledgments
This work was supported by Grants from the National Natural Science Foundation of China (No. 31370561), Specialized research Fund for the Doctoral Program of Higher Education (No. 20133418110005), Anhui Provincial Natural Science Foundation (No. 1308085MC36), and Anhui Agricultural University disciplinary construction Foundation (No. XKTS2013001).
Author Contributions
Conceived and designed the experiments: YXZ YYW LF. Performed the experiments: YXZ. Analyzed the data: YXZ HWY YYW LF ZC. Wrote the paper: YXZ HWY YYW. Participated in the design of this study and revised manuscript: YXZ HWY YYW.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
The authors Yuxin Zhu and Hanwei Yan have contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhu, Y., Yan, H., Wang, Y. et al. Genome Duplication and Evolution of Heat Shock Transcription Factor (HSF) Gene Family in Four Model Angiosperms. J Plant Growth Regul 35, 903–920 (2016). https://doi.org/10.1007/s00344-016-9590-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00344-016-9590-5