Introduction

Dehydrins (DHNs) are Group II (D-11 family), late embryogenesis abundant (LEA) proteins that accumulate during seed desiccation and in response to water deficit induced by drought, low temperature or salinity in vegetative tissues or reproductive tissues (Close 1996; Allagulova et al. 2003; Kosova et al. 2007; Tunnacliffe and Wise 2007). A vital role in bud dormancy and cold acclimation of trees has been attributed to their certain DHN proteins (Rinne et al. 2010; HongXia et al. 2009; Rohde et al. 2007; Rorat 2006). DHNs are widely distributed in various organisms of plant kingdom including all seed plants, nonvascular plants and seedless vascular plants, where they accumulated in different cell compartments but mostly in the cytoplasm and nucleus (Battaglia et al. 2008; Tunnacliffe and Wise 2007; Allagulova et al. 2003).

The distinctive sequence feature of all DHN proteins is a conserved, Lys-rich 15-residue motif, EKKGIMDKIKEKLPG, named the K-segment often found in one to 11 copies within a single protein. Other optionally additional motifs in DHNs are the Y-segment ([V/T]D[E/Q]YGNP) usually found in one to 35 tandem copies in the N-terminus; the S-segment containing a track of Ser residues; the less conserved Ф-segment rich in polar amino acids and lay interspersed between K-segment (Close 1996; Allagulova et al. 2003). The presence and arrangement of these different conserved motifs in a single protein allow the classification of DHN proteins into five subgroups: YnSK2, Kn, SKn, KnS, and Y2Kn (Rorat 2006; Allagulova et al. 2003). In addition, some DHNs could only be assigned to certain intermediate forms instead of the five subgroups, such as SK3S arrangement in one DHN protein of chickweed (Z21500; Close 1996). These considerable research efforts have been employed in exploring DHNs structure and function for herb model plants, such as Arabidopsis, maize and barley, but such in-depth study has not yet been directed towards woody trees.

The genes encoding DHN are a multigene family (Hundertmark and Hincha 2008). Recent studies, together with the release of complete genome sequences for different organisms, have led to the identification of DHNs in single plant genome. In previous published reports, 12, 9, and 10 DHN genes had successively been identified using different methods in Arabidopsis (Hundertmark and Hincha 2008; Tunnacliffe and Wise 2007; Alsheikh et al. 2005), 13 in barley (Choi et al. 1999; Rodriguez et al. 2005), 8 in rice (Wang et al. 2007). In addition, so far, only one SK2 type DHN in different poplars was successively identified and their response to various stresses was confirmed (Bae et al. 2009; Caruso et al. 2002). Even though genes encoding DHNs have been identified in several plant species, to date, there is still no comprehensive and systematic study characterizing all DHN genes in a single woody plant genome. In order to explore all genes encoding DHN proteins in poplar, complete Populus trichocarpa genome was investigated using the method of domain search. Here, we exhibit an identification and analysis of DHN proteins and their respective genes in P. trichocarpa. As we know, this is the first systematic characterization of all genes encoding DHN proteins in a single woody plant genome, and represents the basis for future studies on the in vivo each poplar DHN function.

Methods

Identification and chromosomal location of poplar DHN genes

The complete protein sequence database was downloaded from P. trichocarpa v1.1 (www.jgi.doe.gov/poplar). Hidden Markov Model (HMM) profile file (dehydrin.hmm) of the Pfam Dehydrin domain (PF00257) was downloaded from the Pfam database (http://pfam.sanger.ac.uk/). The dehydrin.hmm file was exploited as a query to identify the DHN genes in the poplar protein database using the hmmer search command of the HMMER (v 3.0) software, which was widely applied for identification of homologues of an interested protein family (Finn et al. 2010; Eddy 2009). All non-redundant (Nr) hits with expected values less than 0.1 were collected, and then were respectively searched applying BLASTP program across REFseq Nr protein database in NCBI (http://www.ncbi.nlm.nih.gov/). The expressed sequence tags (EST) were retrieved by BLASTN the corresponding transcript/CDS from P. trichocarpa v1.1 (www.jgi.doe.gov/poplar) as query sequence online search against all of the Populus EST sequences in NCBI. Matches above 95% identity and over an alignment of at least 100 bp were considered as corresponding sequences of the dehydrin genes. Multiple sequences alignments of these sequences with their individual transcript/CDS sequence were performed using ClustalW program in BioEdit software under the default parameters settings (Hall 1999). Sequence alignments were manually adjusted to get maximum matching.

The 11 identified DHN genes were located in the genome of P. trichocarpa using NCBI map viewer (http://www.ncbi.nlm.nih.gov/projects/mapview/). Identification of duplicated regions between chromosomes was completed as described in Tuskan et al. (2006). The tandem gene duplication in poplar was determined according to the criteria that five or fewer gene loci occurred within a range of 100 kb distance (Hu et al. 2010; Finn et al. 2006).

Identification of conserved motifs

Extraction of motifs from 34 DHN protein sequences in poplar, Arabidopsis and barley, are performed using the software of MEME online-version 4.6.1 (Multiple Expectation Maximization for motif Elicitation), which is one of the most widely used tools for observation of new sequence patterns in biological sequences and analysis of their significance (Bailey and Elkan 1994; Bailey et al. 2006). MEME program is run with the following parameters: the optimum number for each motif is between 2 and 120, distribution of motif occurrences is any number of repetitions, maximum number of motifs is 15, and the optimum motif widths were restricted between 8 and 16 residues.

Phylogenetic analysis and in silico microarray analysis

Multiple sequences alignments of the full-length protein sequences were performed using ClustalW program in BioEdit software with default parameters (Hall 1999). Based on these aligned sequences, the unrooted phylogenetic trees were constructed using MEGA 5.0 software (Tamura et al. 2011), by both Neighbor-joining method (Saitou and Nei 1987) and Minimum Evolution method with the parameters (p-distance and completed deletion). The reliability of the phylogenetic tree was estimated using bootstrap value with 1000 replicates. Probe sets corresponding to individual poplar DHN gene were retrieved using an online probe match tool available at NetAffxTM Analysis center (http://www.affymetrix.com/analysis/index.affx). The transcript relative abundance values of all poplar DHN genes from various tissues were obtained from the poplar transcript abundances datasets (Wilkins et al. 2009) in the website of the Populus electronic fluorescent pictograph browser (Poplar eFP browser; http://bar.utoronto.ca/efppop/cgi-bin/efpWeb.cgi), whose data originated from the NCBI Gene Expression Omnibus (accession number: GSE13990). For genes with more than one probe set, the mean expression values were considered. When several genes have the same probe set, then they are considered as the same level of transcript abundance. Dendrogram and heat map for display expression pattern were obtained using the Cluster 3.0 (de Hoon et al. 2004) for normalizing and hierarchical clustering with average linkage based on Pearson coefficients, and then Java Tree-View 1.1 program (Saldanha 2004) for visualizing the analyzing datasets.

Results and Discussion

Identification and characterization of DHN gene family in Populus

To identify DHN genes and their putative encoded polypeptides present in Populus genome, initially, keyword search of “dehydrin” against P. trichocarpa genome database was performed (www.jgi.doe.gov/poplar). It was found that nine members had been annotated as DHN genes displayed in Nos. 1–9 of Table 1; Subsequently, aim to confirm this reliability of these identified genes, HMM profile file (dehydrin.hmm) of the Pfam Dehydrin domain (PF00257) was exploited as query file for search across P. trichocarpa genome (www.jgi.doe.gov/poplar). A total of 10 non-redundant putative DHN family genes were identified as significantly encoding dehydrin domain, of which eight (No. 1–8 of Table 1) were included and two (No. 10 and 11, 817,405 and 276,757) were not. The detailed information of DHN family genes in poplar was listed in Table 1. In addition, to provide a simplified nomenclature for each identified gene, all the genes (and corresponding proteins) were denominated as PtrDHN (Table 1), and the followed digit represents the gene number within the group.

Table 1 All identified dehydrin genes and putative encoded poplypeptides present in Populus trichocarpa genome

The gene (PtrDHN-9, 665494) described as “dehydrin” in P. trichocarpa v1.1 appears to be incorrect annotation because of an absence of significant DHN domain throughout its encoding polypeptide. However, our sequence analysis indicated that its encoding protein had high sequence similarity with three KS-type of DHN proteins documented previously in Arabidopsis (Hundertmark and Hincha 2008), soybean (Alsheikh et al. 2005) and barley (Rodriguez et al. 2005), especially with their K-, S-, and Ф-segments (Fig. 1). But, it is especially note worthy that, consecutive deletion for four amino acids of “KIKD” were discovered in its K-segment (Fig. 1), being able to explain why it does not match the DHN domain (PF00257) in our domain search. Due to the few deletions in K-segment and high sequence identity with other plant DHN proteins, the PtrDHN-9 gene was also defined as DHN gene.

Fig. 1
figure 1

Multiple sequence alignment of Populus PtrDHN-9 with other plant DHNs. K-, S-, and Ф-segments in our study are marked with a blue box under the corresponding description. Consecutive deletion for four amino acids of “KIKD” in K-segment of PtrDHN-9 is displayed with a red box under the description of “deletion”. Gray shading represent 70% identical residues among the sequences. PtrDHN-9 (JGI Protein ID, 665494); AtDHN (At1g54410.1) from Arabidopsis; GmDHN (ABO70349.1) from soybean; HvDHN-13 (AAT81473.1) from barley

Thus, in our study, a total of 11 DHN genes were finally identified in P. trichocarpa genome by the genome-wide survey (Table 1). The number of DHN genes in P. trichocarpa is roughly equal to that of Arabidopsis, which is not in agreement with the ratio of 1.4∼1.6 putative Populus homologues to each Arabidopsis gene according to comparative genomics studies (Tuskan et al. 2006). In contrast, the expansion, often present on a large number of Populus multigene families (Tuskan et al. 2006), seems not to occur in Populus DHN gene family. It could be speculated that the presence of similar number of DHN genes in Populus genome might reflect the analogous needs for these genes involving in their specific stress-related function.

Revising of DHN gene-encoding proteins as well as discovering of alternative splicing present in poplar DHN genes

Given the current draft nature of the Populus genome (www.jgi.doe.gov/poplar), where a first-draft reference set of 45,555 protein-coding gene loci was tentatively identified, the gene set in Populus will need to be refined gradually (Tuskan et al. 2006). To calibrate our preliminary identification of the eleven DHN genes from JGI poplar database, their encoding proteins were further compared by a BLASTP search against NCBI Reference sequence (RefSeq) database, which provides a non-redundant and validated collection of sequences representing genomic data, transcripts and proteins (Pruitt et al. 2006, 2005). As a result, among them, the three poplar DHN proteins (PtrDHN-8, PtrDHN-11, and PtrDHN-9) without counterparts in NCBI RefSeq database (Table 1), may represent truncated or incorrect proteins. Their corresponding EST were retrieved by BLASTN online search to obtain support and mend them for further analysis. These ESTs from NCBI perfectly matched CDS sequences, particularly for the nucleotide acid sequences encoding amino acid sequences of K-segment, were selected for alignment with their individual transcript/CDS from P. trichocarpa v1.1 (Electronic Supplementary Material (ESM) Fig. S13). As for the transcript of PtrDHN-9 (665494), a large number of EST support “ATG” at position 49∼51 as translation start codon, “TAA” at position 337∼339 as translation stop codon (ESM Fig. S1). According to this, the encoded amino acids after the “TAA” were removed from the original PtrDHN-9 encoding protein sequence (ESM Fig. S1 and ESM Table S1). The absence of translation start codon “ATG” lead to the incomplete N-terminus of PtrDHN-8 (195568) protein, our EST sequence alignment and comparative analyses clearly demonstrated that upstream of the first three nucleotides “GCC” from PtrDHN-8 transcript should be extended by the “ATG” encoding Met as initiation codon as well as the followed “GCT” encoding Ala (ESM Fig. S2 and ESM Table S2). Moreover, “TAG” at position 394∼396 was strongly supported by ESTs as its translation terminator codon (ESM Fig. S2). The revised CDS and encoding protein sequence of PtrDHN-8 were displayed in ESM Table S2; The gene PtrDHN-11 (276757) had no significant EST match, and “TAG” at position 196∼198 of transcript as stop codon caused the early translation termination (ESM Fig. S3 and ESM Table S3). Based on this revised CDS sequence, its encoding amino acid sequence in the front of the stop codon was determined not to match any DHN domain (ESM Fig. 3 and ESM Table S3). Therefore, it was excluded from the identified 11 DHN gene of poplar above mentioned, but identified as putative pseudogene of DHN because of its high sequence identity with another DHN gene PtrDHN-1 (550802). In this endeavor, two (PtrDHN-9 and PtrDHN-8) out of the three problematic transcripts were confirmed by EST support with high confidence, and modified into complete protein, whereas the third gene PtrDHN-11 (276757) was identified as pseudogene of DHN.

Processing of alternative transcripts as a mechanism of regulation of gene expression plays a direct role in plant development (Wang and Brendel 2006). Though recent computational studies in Arabidopsis and rice have estimated that over 20% of genes are alternatively spliced in both species (Filichkin et al. 2010; Iida et al. 2004; Wang and Brendel 2006), the presence of alternative splicing in DHN genes has not been reported. In our study, one cDNA sequence for putative dehydrin (AJ300525.4) from Populus euramericana, which has not previously been mapped to poplar genome, has very high identity with CDS of PtrDHN-1 (550802) gene. Comparison between both of them and the genomic sequence of PtrDHN-1 gene revealed the presence of alternative splicing in PtrDHN-1 genes, the cDNA sequence (AJ300525.4) and CDS being its two splicing isoforms (Fig. 2a and Table 1). Their encoding products are PtrDHN-1.2 (CAC18724.4) and PtrDHN-1.1 (XP_002300665.1), respectively. In order to reveal the nature of the splicing variation, the positions of exons and introns were determined based on their sequence alignment with the genomic sequence, and their intron/exon structure and product isoforms of both the alternatively spliced transcripts were displayed as Fig. 2a and b. The presence of alternative splicing in PtrDHN-1 (550802) gene resulted in its encoding two splicing isoforms of PtrDHN-1.1 and PtrDHN-1.2 (Table 1). Therefore, 10 DHN genes encoding 11 DHN proteins were identified in total in our genome-wide investigation of DHN genes.

Fig. 2
figure 2

Schematic representation of the intron/exon structure of the alternatively spliced transcripts and their protein product isoforms for PtrDHN-1 gene. a Display the intron/exon structure of the two alternatively spliced transcripts (PtrDHN-1.1 and PtrDHN-1.2) for PtrDHN-1 gene. b Sequence alignment of protein product isoforms encoded by the two alternatively spliced transcripts of PtrDHN-1 gene for comparison of their exons encoding peptides. Exons sequences are represented by black boxes and numbered E1–E9, while intron sequence are indicated with gray lines and are numbered I1–I7. Base pairs length of exons and introns was shown under each region, and also can be estimated by the scale at the top. The names of the alternatively spliced transcripts are shown on the left, with their chromosomal location on the right. E1–E9 in (b) represents each exons encoding peptides

Chromosomal location and duplication of DHN gene in Populus

In silico mapping of the gene loci showed that, except for the two DHN genes of PtrDHN-5 and PtrDHN-11 assigned to individual scaffold fragments, the others were distributed across 7 of 19 Linkage Groups (LG; Table 1 and Fig. 3). The distribution of the DHN genes among these LGs appears to be uneven: LG II, III, IV, V, IX, and XIX individual have only one DHN gene, high density of DHN genes was discovered in LG XIII, where three DHN genes (PtrDHN-4, -6 and -7) were organized in one cluster (Cluster I) within a 20 kb fragment (Fig. 3).

Fig. 3
figure 3

Chromosomal location of the Populus DHN genes. Nine genes are mapped to the 7 of 19 Linkage Groups (LG), while the other two genes located on unassembled scaffolds. The schematic representation of genome-wide chromosome organization arisen from the whole-genome duplication event in Populus was obtained from (Tuskan et al. 2006). Segmental duplicated homologous regions are shown with the same color. Only the duplication blocks containing DHN genes are connected with lines in shaded colors. Three tandemly duplicated genes within 20 kb displayed with red box were organized into one cluster (Cluster I). Scale at the bottom represents a 5-Mb chromosomal distance

Previous analysis of Populus genome has identified the presence of paralogous segments caused by the whole-genome duplication event in the Salicaceae (salicoid duplication), which occurred 65 million years ago and significantly contributed to the amplification of many multigene families (Tuskan et al. 2006). To determine the possible relationship between the DHN genes and paralogous segments, the Populus DHN genes were mapped to the duplicated blocks of P. trichocarpa established in the studies of Tuskan et al. (2006). The distribution of DHN genes relative to the duplicated blocks is illustrated as Fig. 3. It was found that all the nine mapped DHN genes (100%), are located in duplicated blocks. Two duplicated pairs (PtrDHN-1/3 and PtrDHN-2/8) are each located in a pair of paralogous blocks and can be considered as direct results of the segmental duplication event (Fig. 3). Similarly, Cluster I/PtrDHN-9 also corresponds to a pair of paralogous blocks created by the whole-genome duplication event (Fig. 3). One duplicated pair (PtrDHN-10) harbored DHN genes on only one of the blocks and lack corresponding duplicates, suggesting that the loss event of its corresponding paralogous genes should have occurred after the segmental duplication events (Fig. 3). The findings support the result that the most abundant genes losses in eukaryotes occur following the whole genome duplication (Abdel-Haleem 2007).

Furthermore, the tandem duplications also contribute to the expansion of DHN gene family. In LG XIII, there is one DHN cluster (Cluster I) with three genes tandem arranged in the same orientation spanning a 20-kb fragment (Table 1 and Fig. 3). Together with the high sequence identities among them, the three tandem DHN genes within Cluster I were considered to be direct results of the tandem duplication events. Their organization in duplicated blocks implied that the presence of the segmental duplication events was prior to the tandem duplication. According to the genomic organization of DHN genes, segmental duplication as well as tandem duplication events contributed to the expansion of DHN gene family in the Populus genome. Similarly, the two events had also been shown to contribute to the expansion of DHN genes in Arabidopsis (Hundertmark and Hincha 2008) and rice (Wang et al. 2007).

In our study, Populus DHN gene family has been preferentially retained at a rate of 100%, while in Populus genome, about only one-third of putative genes are retained in duplicated blocks resulting from the whole genome duplication events (Tuskan et al. 2006). The high retention rate of duplicated genes had also previously been documented in other Populus gene families (Hu et al. 2010; Barakat et al. 2009; Kalluri et al. 2007). In addition, the segmental duplication ratio of DHN genes in this study is predominantly higher than that of the tandem duplication, suggesting that the segmental duplication might be main events contributing to the expansion of Populus DHN genes.

Identification of conserved motif and classification of Populus DHN proteins

The conserved K-segment is the most distinctive feature in motifs of all DHNs, while other motifs of S- and Y-segments are also identified as important motifs (Rorat 2006; Close 1996). To reveal these motifs present on Populus DHN proteins, extraction of motifs was performed by MEME based on total 34 DHN protein sequences from poplar, Arabidopsis (Hundertmark and Hincha 2008), and barley (Choi et al. 1999; Rodriguez et al. 2005; Fig. 4, ESM Table S4, S5 and S6). As a result, six significant motifs were retrieved (ESM Fig. S4a–f), among which motif-1 (ESM Fig. S4a) and motif-4 (ESM Fig. S4d) were respectively identified as K- and Y-segments based on their good identities with previous K- and Y-motifs of DHNs. Both motif-2 (ESM Fig. S4c) and motif-3 (ESM Fig. S4d), characterized by a track of Ser residue, were considered as S-segment, in order to distinguish them, motif-2 with width of eight amino acid residues are designated as “S-segment”, motif-3 with width of 16 residues as “S-segment”. In contrast, motif-5 and motif-6 were identified as novel motifs in DHNs because of no known homologous motifs matched in Pfam and SMART databases (ESM Fig. S4e and 4f). Our additional investigation showed that besides DHNs of Populus (4/11; Fig. 5b), the motif-5 still widespread occurred in those of barley (7/13) and rice (4/6), but no occurrence in those of Arabidopsis (ESM Fig. S5); instead, the other novel motif-6 was rarely present, except for DHNs of Populus (3/11), only one (At1g20440.1, YSK3) in those of Arabidopsis (ESM Fig. S5). However, whether the two novel motifs confer unique functional roles to DHNs remains to be further investigated.

Fig. 4
figure 4

Amino acid sequence alignment of all identified poplar DHNs. K-, Y-, and S-segments were respectively represented with open blue, yellow, and red boxes

Fig. 5
figure 5

Phylogenetic relationships and motif compositions of poplar DHN proteins. a Phylogenetic analysis of poplar DHN proteins. Neighbor-joining bootstrap and Minimum-Evolution values for clans supported above the 55% level were respectively indicated above and below the branches in red font. All poplar DHN protein names and their individual corresponding ID number for phylogenetic analysis are listed as Table 1. b Schematic view of the conserved motifs in the DHN proteins from Populus elucidated MEME (Bailey and Elkan 1994). Each motif is represented by a capital or number in the colored box, in which K represents K-segment, S and S represents S-segment, Y represents Y-segment, 5 represents motif-5, and 6 represents motif-6. The height of the motif “block” is proportional to −log (p value), truncated at the height for a motif with a p value of 1e−10. The black lines represent the non-conserved sequences. Refer to ESM Fig. S4 for the details of individual motif

Based on the generally accepted classification for DHNs (Rorat 2006; Close 1996), it was found that, the eleven poplar DHNs were assigned to four out of the five subgroups, the Kn subgroups of DHNs were the most numerous, being represented by five members. YnSKn and KnS were each represented by 2 members, and SKn DHN subgroups were represented by just one member (Table 2, Fig. 4, ESM Table S4 and Fig. 5b). Interestingly, the remaining one protein (PtrDHN-10) with the SKS composition, cannot be assigned to a certain class of DHNs (Fig. 5b and ESM Table S4). It probably represents an intermediate form of SKn and KnS, which had been documented to occur in one DHN proteins of Stellaria longipes (Z21500; SK3S; Close 1996). It was note worthy that one Kn subgroup of DHN member (PtrDHN-1.2) was characterized by 13 repeated K-segments (K13; Fig. 5b), of which a maximum repeated number has previously been documented of being from spinach CAP85 (K11; Kaye et al. 1998). Kn subgroup of poplar DHNs become the most predominant DHNs with highest proportional occurrence (5/11) than those of Arabidopsis (1/10) and barley (1/13; Fig. 4, ESM Fig. S5, ESM Table S4, S5 and S6). However, it had been confirmed that the Kn subgroup DHNs from various plants were induced by cold temperature, dehydration, and ABA, and seem to be directly involved in cold acclimation processes (Kosova et al. 2007; Rorat 2006). Thus, the significantly enriched Kn subgroup DHNs are present in Populus genome, suggesting that the more DHNs of the direct responsibility for cold acclimation is required for woody plants. This should be a possible reason why woody plants contain more cold-inducible Kn type DHN genes than herbaceous plants. Furthermore, our evidence also indicated that the YnSKn subgroup of DHNs were the highest proportional occurrence on Arabidopsis (5/10) and barley (9/13; ESM Table S5 and S6), in contrast, fewer members (2/11) are assigned to the YnSKn subgroup in Populus DHNs (Table 2 and ESM Table S4). It could be explained that the presence of more Kn subgroup than Arabidopsis and barley were caused by the loss of gene sequence encoding Y-segments in Populus DHNs after the occurrence of evolutional divergence between Populus, Arabidopsis, and barley.

Table 2 Biochemical properties of all identified poplar DHN proteins

Divergence within Populus DHN genes

An unrooted tree was, respectively, generated by both Neighbor-Joining (Saitou and Nei 1987) and Minimum-Evolution methods using MEGA 5.0 (Tamura et al. 2011) based on complete protein sequences of all the DHN genes in Populus. The tree topologies generated by the two methods were comparable without modifications at branches, and supported by their high bootstrap values of >55, suggesting that we constructed a reliable unrooted tree topology, in which the 11 poplar DHNs were grouped into four distinct clans, including Type I, Type II, Type III, and Type IV (Fig. 5a). The four distinct types generated by their evolutional divergence mostly corresponded to the subgroups identified by motif analysis above. The PtrDHN-2 and PtrDHN-8 belonging to YnSKn subgroup were assigned to type II, and the SKnS subgroup of PtrDHN-10 representing intermediate form of SKn and KnS to type IV. Type I contains two KnS subgroup of DHNs (PtrDHN-6 and -9) and three Kn subgroup of DHNs (PtrDHN-7, -4, and -5; Fig. 5a). The latter differs from other two Kn subgroup DHNs of PtrDHN-1.1 (K9) and -1.2 (K13) by the presence of a novel repeating motif (motif-5; Fig. 5 and ESM Fig. S4e). The two Kn subgroup of DHNs (PtrDHN-1.1 and -1.2), together with one SKn subgroup of DHN (PtrDHN-3), were assigned to type III because of their presence of the other one novel motif (motif-6; Fig. 5 and ESM Fig. S4f). The similar conserved motifs of DHN proteins within the same types might provide additional supports for the unrooted tree topology. Also, proteins encoded by one paralogous pairs in DHN gene family well correspond to the same types, for instance, the paralogous pairs of PtrDHN-1/3 were assigned to type II, PtrDHN-2/8 to type III, Cluster I/PtrDHN-9 to type I. This evidence further supports the expansion of DHN gene family in the Populus genome caused by segmental duplication as well as tandem duplication events.

Biochemical properties of poplar DHN proteins

Generally, DHNs are characterized by the presence of abundant Gly and polar amino acid, but lack Cys and Trp (Close 1997, 1996). Analysis of the amino acid compositions of all poplar DHN proteins indicated that they share the common feature, only one exceptional example is PtrDHN-10 (817405) of the SKS subgroup with relatively high content of Cys (4.6%; ESM Table S7). Together with their relatively low GRAVY values in the range of −1.995 to approximately −0.880 (Table 2), confirm the presence of the very hydrophilic nature in Populus DHN proteins, which is in agreement with other plant DHNs (Kosova et al. 2007). For example, the Kn subgroup of PtrDHN-7 (K3) with molecular mass of 19.1 kDa, Gly, Gln, Lys, Glu, and Asp represent 60.0% of the total amino acids, whereas no Cys and Trp were found (Table 2 and ESM Table S7). Calculation on MW of all poplar DHN proteins shows that they are characterized by a range of molecular masses from 10.7 to 68.9 kDa, most (9/11) of which are relatively small falling in a range of 10∼26 kDa, only two are larger, respectively, being 50.8 and 68.9 kDa (Table 2). However, their unique amino acids composition led to the presence of discrepancy that the apparent MW on electrophoretic gels significantly higher than the actual MW of these proteins calculated from their amino acid sequence (Close 1997; Kosova et al. 2007). Like barley DHN5, its MW on SDS gels is evaluated into about 84 kDa according to standards of protein marker though its actual MW is only 58.5 kDa (Kosova et al. 2007). Accordingly, further experiment is required for confirming actual MW corresponding to apparent MW of each poplar DHNs.

In addition, isoelectric point (pI) value is also considered to be important biochemical properties for subdivision of DHNs of plants because DHNs of different acidic or basic features within the same subgroup might respond to various environmental factors (Allagulova et al. 2003). Theoretical pI values of Populus DHNs fluctuate in a wide range from 5.01∼9.96, with five acidic DHNs, five basic and one neutral DHNs (Table 2), which is consistent with pI range (5.21∼9.52) of barley DHNs (Kosova et al. 2007).

Tissue location of DHN gene expression in Populus

Numerous studies of DHNs have confirmed that they not only accumulated during seed desiccation and in response to water deficit induced by drought, low temperature, or salinity, but were also present in nearly all vegetative tissues during optimal growth conditions (Kosova et al. 2007; Rorat 2006; Tunnacliffe et al. 2010). To investigate all poplar DHN gene expression pattern of normal developmental tissues, we reanalyzed the poplar Affymetrix microarray data (Wilkins et al. 2009), using these matched probe sets to each poplar DHN (ESM Table S8). Similarly, poplar DHN genes expressed across nearly all vegetative tissues except for mature leaves (ML; Fig. 6). It is notable that the largest fraction of DHN genes preferentially expressed in continuous light-grown seedling (CL; 7/11), MC (6/11), FC (6/11), and young leaf (YL; 6/11). Relatively large fraction of DHN genes expressed in dark-grown seedlings (DS; 4/11), xylem (X; 4/11), etiolated dark-grown seedling transferred to light for 3 h (DL; 3/11), and root (R; 2/11).

Fig. 6
figure 6

Relative transcript abundance profiles of Populus DHN genes across different tissues. A heat map displaying the transcript abundance is produced using the genome-wide microarray data generated by Wilkins et al. (2009). The transcript abundance levels for the Populus DHN genes were clustered using hierarchical clustering based on Pearson correlation. Color scale at the bottom of each dendrogram represents log2 expression values, green color represents low level and red color represents high level of transcript abundances. Populus DHN within the same subgroup are marked in the common color. ML mature leaf; YL young leaf; R root; DS dark-grown seedlings; DL etiolated dark-grown seedling transferred to light for 3 h; CL continuous light-grown seedling; FC female catkins; MC male catkins; X xylem

Furthermore, several previous studies obtained from different species, indicated that different types of DHN proteins can localize to common tissues during development under normal growth conditions (Battaglia et al. 2008; Rorat 2006). Our in silico expression study of all poplar DHN genes confirms this conclusion, for example, the same tissue expression pattern are found between the KnS type of PtrDHN-6 and the Kn type of PtrDHN-4 and -7, as well as between the SKnS type of PtrDHN-10 and the YnSKn type of PtrDHN-2 and -8 (Fig. 6). However, we also found that Populus DHN genes belonging to the same types preferentially expressed in the common tissues under normal growth conditions. For example, PtrDHN-4, -7, -1.1, and -1.2 belonging to the same Kn type share the similar tissue expression patterns that preferentially expressed in MC, FC and YL, few accumulated in ML, R, CL, DL, and DS. PtrDHN-2 (Y3SK2) and -8 (Y3SK) belonging to the common YnSKn type share the same expression patterns with the highest transcript abundances especially present in seedlings under specific conditions (CL, DL, and DS), which is consistent pattern with this type of DHNs in other plants, such as Indian mustard BjDHN1 (Y3SK2) and oilseed rape BnDHN1 (Y3SK2; Yao et al. 2005). The evidence that poplar DHN genes within the same type preferentially share similar expression patterns across the nine tissues during normal growth conditions, would provide one useful data resource for exploring correlation between DHN type and their tissue localization.

Conclusion

Considerable research effort has been performed in characterization of the DHNs in herbaceous plants, such as barley, rice, and Arabidopsis, but such effort has not yet been directed towards woody trees. In this work, the above issues are addressed using the method of genome-wide identification and in silico analysis. This comprehensive analysis will be an important starting point for future efforts to elucidate the function role of all DHN proteins in poplar.