Introduction

Expansins comprise a plant-specific superfamily of proteins that characteristically loosen the plant cell wall by weakening the non-covalent bonding of polysaccharides to one another (Cosgrove 2000, 2005; Li et al. 2003; McQueen-Mason and Cosgrove 1994; McQueen-Mason et al. 1992; Sampedro and Cosgrove 2005). Expansin plays important roles in plant growth and resistance mechanisms, including root growth (Brotman et al. 2008; Guo et al. 2011; Lin et al. 2011), fruit softening (Brummell et al. 1999; Harrison et al. 2001), leaf growth (Cho and Cosgrove 2000; Lu et al. 2013), seed yield (Bae et al. 2014), drought and stress response (Han et al. 2012; Lu et al. 2013). They are also involved in cell enlargement and cell wall changes induced by plant hormones such as gibberellin (Chen et al. 2001), ABA and auxin (Cox et al. 2004), cytokinin (Lee et al. 2008), ethylene (Vreeburg et al. 2005) and brassinosteroids (Park et al. 2010).

Integral expansins range from 250 to 275 amino acids and are composed of three domains: signal peptide, which is presumably removed during processing in the endoplasmic reticulum (Sampedro and Cosgrove 2005); domain 1, which includes a series of conserved cysteines and a His-Phe-Asp (HFD) motif (Cosgrove 2000, 1997); domain 2, which is distantly related to group-2 grass pollen allergens (Cosgrove 2000). Based on phylogenetic and conserved sequence analyses, the expansin superfamily has been classified into four families, EXPA, EXPB, EXLA and EXLB (Kende et al. 2004; Sampedro and Cosgrove 2005).

Wheat (Triticum aestivum L.) is one of the most important crops in the world and is relied upon for human food, animal feed and starch ethanol production. In contrast to the extensive research conducted on expansin in both model and crop plants such as Arabidopsis, rice, poplar, maize and apple (Hemalatha et al. 2011; Zhang et al. 2014a, b), there are only very limited reports on the characterization of expansin in wheat. Recently, the draft genome sequence of wheat was decoded, which has provided an excellent opportunity for genome-wide analyses of all genes belonging to specific gene families (Mayer 2014; Haider et al. 2013). However, no genome-wide information is currently available on the wheat expansin gene family.

Given the importance of expansin in diverse biological and physiological processes and their potential application for development of transgenic plants, we carried out a systematic analysis of the wheat expansin family in the present study for the first time. Then the chromosome location and gene structure of the putative expansin genes predicted by genome-wide surveys of the wheat genomic sequences were carefully analyzed. Additionally, the putative expansin gene was subjected to phylogenetic analyses with its Arabidopsis, rice and maize counterparts. These comparisons enabled the identification of gene orthologs and clusters of orthologous groups that can be subjected to further functional characterization. To our knowledge, this is the first reported genome-wide analysis of the wheat expansin family, which can provide valuable information for understanding the classification and putative functions of TaEXPs. These findings open doors to further improvement of quality and yield in wheat via genetic engineering.

Materials and methods

The identification of expansin genes in wheat

To identify the members of the expansin gene family in T. aestivum, two different approaches were performed. First, all of the known Arabidopsis and rice expansin gene sequences, which downloaded from the Arabidopsis genome TAIR 9.0 release (The Arabidopsis Information Resource, http://www.arabidopsis.org/) (Poole 2007) and Rice Genome Annotation Project database (Lamesch et al. 2012), were used as query sequences to perform multiple database searches against the proteome and genome files downloaded from PlantGDB (http://www.plantgdb.org/) (Dong et al. 2004). Stand-alone versions of BLASTP and TBLASTN (Basic Local Alignment Search Tool: http://blast.ncbi.nlm.nih.gov) (Altschul et al. 1990), which are available from the National Center for Biotechnology Information (NCBI), were used with an e-value cutoff set to 1e−003. All of the protein sequences derived from the collected candidate expansin genes were examined using the domain analysis programs Pfam (Protein family: http://pfam.sanger.ac.uk/) (Punta et al. 2012) and Simple Modular Architecture Research Tool (SMART): http://smart.embl-heidelberg.de/) (Letunic et al. 2012) with the default cutoff parameters. Second, we analyzed the domains of all of the wheat peptide sequences using a Hidden Markov Model (HMM) (Jeanmougin et al. 1998; Wu et al. 2002) analysis with Pfam searching. Then, we obtained the sequences with the PF03330 and PF01357 Pfam number, which contained typical DPBB_1 and Pollen_allerg_1 domain, from the wheat genome sequences using a Perl-based script. Finally, all of the protein sequences were compared with known expansin sequences using ClustalX (http://www.clustal.org/) to verify the sequences were candidate Expansins (Jeanmougin et al. 1998). Cellular roles and gene ontology (GO) categories were predicted using the ProtFun2.2 server (http://www.cbs.dtu.dk/services/ProtFun/), which relies solely on the protein sequence as input (Jensen et al. 2002, 2003).

Chromosomal location and gene structure of the expansin genes in wheat

The chromosomal locations were retrieved from the genome data downloaded from the wheat genome sequences database using a Perl-based program and mapped to the chromosomes using the Circos (Krzywinski et al. 2009), as well as the gene structure of the expansin genes were generated with the Gene Structure Display Server (GSDS): http://gsds.cbi.pku.edu.cn/) (Guo et al. 2007).

Sequence alignment and phylogenetic analysis

The Expansin sequences were aligned using the ClustalX program with BLOSUM30 as the protein-weight matrix (Jeanmougin et al. 1998). The Multiple Sequence Comparison by Log-Expectation (MUSCLE) program (version 3.52) was also used to perform multiple sequence alignments to confirm the ClustalX results (Edgar 2004). Phylogenetic trees of the expansin protein sequences were constructed using the neighbor-joining (NJ) method of the  Molecular Evolutionary Genetics Analysis (MEGA5 program, http://www.megasoftware.net/) using the p-distance and complete deletion option parameters (Tamura et al. 2011). The reliability of the obtained trees was tested using a bootstrapping method with 1000 replicates. Phylogenetic and chromosomal location analyses were used to identify duplicated genes. The number of nonsynonymous substitutions per nonsynonymous site (Ka) and synonymous substitutions per synonymous site (Ks) were calculated by DnaSP (Librado and Rozas 2009; Rozas 2009).

Expression analyses of the expansin genes in microarray

The development and tissue expression data for various tissues/organs and developmental stages were obtained using Genevestigator (https://www.genevestigator.com/gv/) with the T. aestivum gene chip platforms. Then, the identified expansin-containing gene IDs were used as query sequences to perform searches in the gene chip platform of genevestigator (Grennan 2006).

Results

Identification of expansin genes in wheat

To identify genes encoding expansin in the wheat genome, BLASTP searches were performed using the entire wheat peptide sequences of three other well-studied expansins (Arabidopsis, rice and maize expansins) as queries. The HMM of the SMART and Pfam tools were then exploited as queries to confirm the putative expansin genes. Finally, 98 typical expansin genes containing full open reading frames (ORF) were identified and manually analyzed using InterProScan and ClustalX program to confirm the presence of conserved domains in expansin, and were used for further analysis as described below. In order to distinguish them from the remaining expansin genes, we provisionally named them as TaEXPA1, TaEXPA2 and TaEXPB1, based on their genome locations (Table S1). The ORF ranged from 573 (TaEXPB24) to 1257 bp (TaEXPB26) in length (average = 803 bp). The identified TaEXP genes encode proteins ranging from 190 (TaEXPB24) to 418 (TaEXPB26) amino acids (aa) in length (average = 267 aa) (Table S1).

These expansin genes in wheat were larger than those of other reported plants genomes. The EXPB family has more members in wheat compared to rice and maize (Table 1). The wheat EXLs have not expanded, which has occurred in other plants except soybean, and which may lead to functional changes in soybean. Expansion of the EXPB gene family is particularly impressive in grasses (rice, maize and wheat), with 19, 48 and 42 genes in rice, maize and wheat compared to six in Arabidopsis, three in poplar and one in apple. The main reason underlying this difference is the existence of a large number of tandem duplications in the rice, maize and wheat genomes. Notably, the components of the cell wall are different in monocots compared to the dicots, and there are more glucuronic arab glycosylation xylan and β-(1 → 3), (1 → 4)-d-glucan in monocots. These polysaccharides may be substrates for the EXPB genes (Sarkar et al. 2009; Yennawar et al. 2006).

Table 1 Number of expansin genes in 10 plant species

Phylogenetic relationships and comparative analysis of the expansin gene family in wheat

In order to evaluate the evolutionary relationship among the expansin proteins, full-length amino acid sequences of 98 TaEXPs, 35 AtEXPs, 56 OsEXPs and 88 ZmEXPs were subjected to multiple sequence alignment using the MEGA5 program. The multiple sequence alignment file was then used to construct an unrooted phylogenetic tree. As shown in Fig. 1, the phylogenetic tree divided the EXPs into four groups (EXPA, EXPB, EXLA and EXLB), which was defined by the maximum likelihood method (Fig S1). To be consistent with previous studies, we adopted the same name for the 17 subgroups (Li et al. 2002; Lin et al. 2005; Sampedro and Cosgrove 2005; Zhang et al. 2014a). Meanwhile, 98 TaEXPs were used to conduct phylogenetic analyses. Twenty-seven sister pairs and two clusters of paralogous TaEXP genes were found and marked by purple shadow in the phylogenetic tree, which have very strong bootstrap support (>90 %) (Fig. 2).

Fig. 1
figure 1

Phylogenetic relationship of Arabidopsis, rice, mazie and wheat expansin genes. The phylogenetic tree was constructed based on a complete protein sequence alignment of expansins in Arabidopsis, rice, maize and wheat by the neighbor-joining method with bootstrapping analysis (1000 replicates). The subgroups are marked by colorful background

Fig. 2
figure 2

Phylogenetic relationships, gene structure and protein structure of TaEXP genes. The following panels are shown, from left to right. The phylogenetic tree was constructed based on a complete protein sequence alignment of TaEXPs by the neighbor-joining method with bootstrapping analysis (1000 replicates). The subgroups are marked by colorful background. The numbers next to the branches indicate the bootstrap values that support the adjacent node. Ten sister pairs of paralogous expansin genes are found and indicated by yellow shadow, which had very strong bootstrap support (>90 %). The gray boxes, blue boxes and black lines in the gene structure diagram represent exons, UTR and introns, respectively. The scale bar represents 1000 bp. Protein structure domain were obtained from PFAM and SMART, the red box in the protein structure diagram represents the signal peptide; the pink and the green boxes represent a DPBB domain and Pollen allergen domain, respectively

Structural analyses are supposed to provide valuable information concerning duplication events when interpreting phylogenetic relationships within gene families. In the TaEXP gene family, the number of exons varied from 1 (TaEXPB24 and TaEXPB37) to 5 (TaEXPB26 and TaEXPB38) (Fig. 2). Additionally, most members within the same subfamily shared a similar exon structure and length. In the EXLA subfamily, three expansins had 3 exons and TaEXLA4 had 4 exons. Most TaEXPAs had 2, 3 or 4 exons, 24 TaEXPBs had 4 exons, 41 TaEXPBs had 3 exons, and 29 genes had 2 exons. Furthermore, by comparing the gene structures of Arabidopsis and rice, we found that they shared similar expansin sequences and intron/exon structures, which indicated that all the TaEXP genes belonged to the expansin gene superfamily. Thus, our findings reveal that the gene structure of expansins is relatively simple.

Next, the protein structure of each expansin gene was analyzed using the SignalP, SMART and Pfam databases (Fig. 2). All wheat expansins contained a conserved region that showed a double-psi beta-barrel (DPBB) fold, which is an enzymatic domain, and pollen allergen in the N-terminus (Fig. 2). The conserved intron/exon structure and protein domain in each subfamily supports their close evolutionary relationship and the classification introduced here. Expansin proteins from different families share only 20–40 % identity with each other. The degree of conservation is highest in domain 1 (Sampedro and Cosgrove 2005). Expansin domain 1 has distant homology to GH45 proteins. Proteins from this family have been crystallized and their mechanism of action has been determined (Sampedro and Cosgrove 2005). On the basis of hydrophobic cluster analysis, this structural motif is also known to be present in expansins (Sampedro and Cosgrove 2005). EXLA and EXLB proteins lack the HFD motif, which suggests that their action may differ from that of others, but the functional implications of these differences are currently unknown. Furthermore, by comparing their structure with those of other plants, we found that all expansins share a similar protein conserved domain and intron/exon structure.

Chromosomal locations of expansins in the wheat genome

The physical locations of expansins on wheat chromosomes are depicted in Fig. 3. Seventy-one TaEXPs were found in 14 of the 21 wheat chromosomes. The A, B and D genomes had 21, 36 and 14 expansins, respectively. Chromosome 3B contained 15 expansins, which was the largest; five clusters containing 11 genes were tightly co-located in chromosome 3B, which suggested that tandem duplication events played roles in the evolution of expansin in chromosome 3B. Then, chromosomes 2B and 5B were found to have seven genes, respectively, while chromosome 7D had only 1 TaEXP.

Fig. 3
figure 3

Positions of expansin gene family members on the wheat chromosomes. Sister paralogous pairs are indicated by color line

Next, we searched for evidence of past gene duplication events (tandem and segmental) to elucidate the mechanism underlying the expansion of the TaEXP gene family. As shown in Fig. 3, 15 segmental duplication events of TaEXPs (20 genes) were found in the wheat genome. All of these segmentally duplicated genes were also found to be paralogs in our phylogenetic analysis, as shown in Fig. 2. These results indicated that segmental duplications played important roles in expansin gene family expansion in the wheat genome. Moreover, three clusters were tightly co-located in the wheat genome, which suggested that tandem duplication events also played important roles in the evolution of the expansin gene family in wheat. To explore different selection constraints on the duplicated ZmEXP genes, we calculated Ks and Ka/Ks ratios for each duplicated pair. A Ka/Ks ratio less than one generally indicates accelerated evolution with positive selection, a ratio equal to one corresponds to neutral selection, and a ratio less than one indicates negative or purifying selection. The Ka/Ks ratios of all duplicated pairs were less than one, implying strong purifying selection (Table S2). These results suggest that the functions of the duplicated genes have not diverged drastically over the course of evolution following the duplication events.

GO analysis

To predict the biological functions of TaEXP, cellular roles and GO category annotations were explored using the ProtFun server. The majority of genes fell into the cell envelope functional category and GO categories of stress response and immune response (Fig. 4), indicating their possible roles in these processes. In particular, 89 % of TaEXPs were classified into the cell envelope functional category, and more than 46 % of TaEXPs belonged to the stress response category.

Fig. 4
figure 4

Functional categorization analysis of wheat expansins using ProtFun. a Functional category, b Gene ontology category

Determination of TaEXP expression pattern by microarray analysis

To investigate the expression profiles of expansin during wheat development, we analyzed the expression of the expansin genes at various developmental stages using genevestigator. A total of 19 genes were found to have different expression levels across 10 stages (germination, seedling growth, tillering, stem elongation, booting, inflorescence emergence, anthesis, milk development, dough development and ripening). All 19 genes were detected in more than one stage (Fig. 5a). Most genes were expressed in all developmental stages, such as TaEXPB1, TaEXPB4, TaEXPB5, TaEXLA3, and TaEXLA4. Based on hierarchical clustering, the expression patterns of the wheat expansin genes were divided into three groups, marked a, b and c. Group a genes exhibited low expression value at all ten stages, 3 of 9 group b genes showed higher expression at tillering, and the expression values of all 7 group c genes at booting stage were found to be higher than those of others. Overall, the broad expression patterns of expansins suggest that they play important roles in wheat development.

Fig. 5
figure 5

The expression profiles of TaEXP genes in development stages and tissues. a The expression profile of TaEXP genes in 10 stages of development. The dark and light colour shadings represent relative high or low expression levels, respectively, of the TaEXP genes in ten stages of development (germination, seedling growth, tillering, stem elongation, booting, inflorescence emergence, anthesis, milk development, dough development and ripening). b The expression profile of TaEXP genes in 22 different tissues. The dark and light colour shadings represent relative high or low expression levels, respectively, of the TaEXP genes in 22 tissues, including seedling, coleoptile, seedling leaf, shoot apex, mesocotyl, crown, seedling root, inflorescence, spike, spikelet, anther, pistil, glume, caryopsis, embryo, endosperm, shoot, shoot leaf, sheath, flag leaf, crown and rhizome roots

Next, we investigated wheat expression patterns in a variety of tissues. Heatmap representation of the expression profiles of 19 expansin genes in 22 wheat tissues is shown in Fig. 5b. The expansin gene expression patterns could be divided into three groups, namely group I, II and III, based on the results of hierarchical clustering analysis. Most group I expansin genes were found to have higher expression in caryopsis, embryo and endosperm. Most group II genes had higher expression levels in rhizome roots. Group III genes had higher expression in seedling, coleoptile, seedling leaf, shoot apex, mesocotyl, crown, seedling root and inflorescence than other developmental stages, which suggested that group III genes may play important roles in seedling development.

Discussion

Expansins play important roles in diverse processes, including developmental programs, defense and abiotic stress responses (Sampedro and Cosgrove 2005). Besides, the presence of expansin genes in land plants ranging from mosses to eudicots makes them interesting candidates for studying the evolution of plant development (Carey and Cosgrove 2007; Dal Santo et al. 2013; Zhu et al. 2014). A previous genome-wide analysis showed that expansins are widely present in plants, concluding mosses (Carey and Cosgrove 2007), legumes (Zhu et al. 2014), as well as cruciferous (Sampedro et al. 2006), rosaceae (Zhang et al. 2014a), grassy (Sampedro et al. 2006) plants. However, genomic data on wheat expansins is not available to date.

In this study, we first conducted genome-wide analysis and identified 98 expansins containing full ORFs in wheat, which constitutes the largest expansin family among the well-studied species (Krishnamurthy et al. 2015; Zhang et al. 2014a; Zhu et al. 2014). By comparing across different plant species, we found the large expansion of EXPB subfamily in rice and wheat, which may due to the difference in substrates in the cell walls of monocots and dicots. EXPB may interact with glucuronic arab glycosylation xylan and β-(1 → 3), (1 → 4)-d-glucan, which are abundant in monocots (Sarkar et al. 2009; Yennawar et al. 2006). What factors drove expansion of the EXPB subfamily during plant evolution? In wheat, 3 tightly co-located clusters were found to contain 6 EXPBs, and 15 sister gene pairs were found in EXPBs. So, segmental duplications and tandem duplication events may play roles in the evolution of EXPBs in wheat.

Generally, orthologous genes and genes from the same subfamily hold similar functions. In Arabidopsis, root hair-specific AtEXPA7 and AtEXPA18 genes are necessary for root hair elongation (Cho and Cosgrove 2002; Lin et al. 2011). TaEXPA33 and TaEXPA35 are phylogenetically related to AtEXPA7 as shown in Fig. 1, suggesting that they may share similar expression patterns and functions. AtEXPA1, another well-studied expansin, can accelerate stomatal opening by decreasing the volumetric elastic modulus (Zhang et al. 2011), so the homologous genes TaEXPA15 and TaEXPA24 may play similar roles in wheat. Likewise, ZmEXLAs may have similar function in abiotic stress as the homologous gene AtEXLA2 in Arabidopsis (Abuqamar et al. 2013). Moreover, TaEXPB4 and TaEXPB32 may play roles in seedling growth based on the known role of OsEXP4 in rice (Choi et al. 2003).

Gene expression data revealed the expression patterns of 19 expansins during 10 different wheat development stages and 22 tissues. Results of our expression pattern analysis suggest that expansin genes may play different roles during development and in different tissues. Furthermore, the similarities in expression patterns of most sister genes indicate that functional redundancy is widespread in expansins. GmEXPB2 affects soybean nodulation by modifying the root architecture and promoting nodule formation and development (Li et al. 2015). HvEXPB7, which was revealed by studying the root hair transcriptome of Tibetan wild barley, improves root hair growth under drought stress (He et al. 2015). Overexpression of ClEXPA1 and ClEXPA2 affect growth and development in transgenic tobacco and increase the amount of cellulose in stem cell walls (Wang et al. 2011). DzEXP1 and DzEXP2 are involved in fruit development (Palapol et al. 2015). Thus, the expansins play very important roles in development of plants, including roots, stems, and fruits.

In summary, we present here a complete genomic analysis of the expansin family in wheat along with a wide range of expression data. Our findings should be helpful in laying the foundation for functional characterization of the expansin gene family and further gaining an understanding of the structure–function relationship between these family members. Additionally, our study provides comprehensive information and novel insights into the evolution and divergence of expansin genes in plants. Taken together, these data may aid in elucidating the molecular basis of many important aspects of wheat physiology, such as root development and other processes.