Introduction

Blue copper proteins containing a single type I mononuclear copper site are known as phytocyanins (PCs) in plants which they are associated with electron carrier activity (Giri et al. 2004). The phytocyanin domain has a core β-sandwich comprising seven β-strands, and a disulfide bridge closing the metal centre is a characteristic feature (Hart et al. 1996). Based on the ligand composition of the copper-binding site, the glycosylation state, the domain structure, and the spectral characteristics, PCs can be grouped into four subfamilies: uclacyanins (UCs), plantacyanins (PLCs), stellacyanins (SCs) and early nodulin-like proteins (ENODLs) (Cao et al. 2015).

Residues ligating the copper ion in SCs consist of two His, one Cys, and one Gln, whereas PLCs and UCs also have two His and one Cys, but the Gln is replaced by a Met (Nersissian et al. 1998). Although PLCs have the same four conserved residues as UCs, they lack putative glycosylation sites on the backbone (Nersissian et al. 1998). Interestingly, early nodulins (ENODLs) might be involved in Cu-independent processes since they lack key copper-binding residues (Nersissian et al. 1998). PC genes containing AG (arabinogalactan) glycomodules and signal peptides (SPs) are believed to be members of the arabinogalactan proteins (AGP) superfamily (Mashiguchi et al. 2009).

PCs exert an important part in growth and development of plants, in addition to the influence of their spectroscopic and redox properties. OsUCL29 and ZmUC22 are significantly expressed under various stresses. BrUCL16 and ZmUC19 are specifically expressed in the stem and silique, respectively, which show that the first of the four PC subfamilies (UCs) appear to function in the evolution of polyploid plants (Ma et al. 2011; Li et al. 2013; Cao et al. 2015). Six SCs are induced by oxidative stress and Al toxicity in Arabidopsis (Ezaki et al. 2001, 2005). PeSCL1 and PeSCL3 are strongly expressed in the stem and roots of Phalaenopsis equestris (Xu et al. 2017). The PLC subfamily participates in the growth processes of several specific plants, including pollination through S-Rnase binding in tobacco (Cruz-Garcia et al. 2005). AtPLCs were also regarded as miR408, the member of the microRNAs targets, which were related to plant growth as transcription factors (Sunkar and Zhu 2004). Dong et al. (2005) found that AtPLCs were strongly expressed in the pistil to prevent pollination and destroy the endothecium structure, thereby influencing the growth of the anther in promoter-β-glucuronidase transgenic plants, as was also shown by an immunohistochemical analysis of wild-type pistil tissues. The last subfamily (ENODLs) is related to many aspects of plant development and important function on transport nutrients, solutes, amino acids or hormones, and improves the fitness to pathogens during host colonization at the plant–microbe interface activities (Denancé et al. 2014). Additionally, ENODLs have high expression levels in the inflorescence of some plants, such as AtENODL3/4 (Mashiguchi et al. 2009), BrENODL22/27 (Li et al. 2013) and PeENODL5/7 (Xu et al. 2017). ENODLs are also relevant for the defence responses of plants. Mashiguchi et al. (2009) found that AtENODL2/18 were induced by osmotic and salt stress. Using hybridisation analysis, Yoshizaki et al. (2000) demonstrated that ENODLs were specifically expressed in tissues including apical buds in Pharbitis nil and root nodules of legumes, and the expression of these genes was distinctly down-regulated during floral induction. Furthermore, these genes might be involved in the organ differentiation of plants (Yoshizaki et al. 2000). AtFLA3, another type of chimeric AGP, is involved in microspore development and the formation of the pollen intine via the deposition of cellulose. Overexpression of this gene restricted the progress of pollination to lower the rate of seed production (Li et al. 2010). AtENODL14 localized at the synergid cell surface strongly and specifically interacts with the extracellular domain of the receptor-like kinase FERONIA, which could minutely control the reception of pollen tube (Escobar-Restrepo et al. 2007). Wild-type pollen tubes can not prevent growth and cause rupture after entering the ovules of quintuple ENODL mutants which loss the function, implying the core function ENODLs being in male–female communication and pollen tube reception (Hou et al. 2016). Furthermore, the overexpression of AtENODL15 by the endogenous promoter results in disturbed pollen tube guidance and reduced fertility (Hou et al. 2016).

To date, the features and functions of the PC gene family have been identified and investigated in several plant species, including Arabidopsis, rice, Chinese cabbage, maize, and orchid. However, no comprehensive analyses of the PC gene family in poplar have been conducted. In the present study, we identified 74 probable PtPC genes in Populus and performed comprehensive phylogenetic, structural, promoter, gene expansion and microsynteny analyses. Additionally, we selected 18 PtPCs to investigate their behaviour under drought and salt treatments. The results provide valuable information about their biological functions and stress responses. Furthermore, analysis of tissue-specific expression of the PtPC genes during development showed differences in their spatiotemporal expression patterns, and many were expressed at high levels in roots and xylem. Genome-wide analysis of the PC genes in Populus trichocarpa will facilitate a better understanding of the role of this gene family during poplar growth and development.

Materials and methods

Identification of PC family genes in poplar

The sequences of previously identified PC genes in Arabidopsis were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov/). We performed BLASTP searches (E value < 1e−6) with the Arabidopsis PC proteins as queries to identify PtPCs in the Phytozome database (http://www.phytozome.net, PF02298). Each protein sequence identified by BLASTP was checked for the existence of a plastocyanin-like domain (PCLD) to confirm membership of the PC gene family. The signal peptide (SP), glycosylphosphatidylinositol (GPI) anchor signal, and N-glycosylation sites of PtPCs were predicted by the SignalP 4.1 server (Petersen et al. 2011; Jeßberger et al. 2015), Big-PI Plant Predictor (Eisenhaber et al. 2003) and the NetNGlyc 1.0 server (http://www.cbs.dtu.dk/services/NetNGlyc/), respectively. Potential arabinogalactan glycomodules (AGs) were predicted based on previously reported criteria (Mashiguchi et al. 2004). The subcellular localization of all PCs was predicted using the CELLOv2.5 server (http://cello.life.nctu.edu.tw/).

Phylogenetic analysis

Protein sequences and alignments were analyzed using the DNAMAN program, and a phylogenetic tree was constructed with default parameters using the neighbour-joining (NJ) method in MEGA6.0 with 1000 bootstrap replicates (Hu et al. 2010; Tamura et al. 2013).

Analysis of exons/introns, conserved motifs, and chromosomal location

We analyzed the exon/intron structure of PC genes by comparing the coding DNA sequence (CDS) and the corresponding genomic DNA sequence using the online GSDS server (http://gsds.cbi.pku.edu.cn/). Conserved motifs were predicted using the online MEME program (http://meme.nbcr.net/meme/cgi-bin/meme.cgi). An image of the chromosomal location was constructed using MapInspect software based on the initial positional information provided in the Phytozome database.

Analysis of microsynteny and gene duplication

A syntenic block is defined as an area in which exons/introns in orthologs are located within 15 genes upstream or downstream in both genomes (Wang et al. 2015). Syntenic blocks within the PtPC gene family among chromosomes were acquired from the PGDD database (http://chibba.agtec.uga.edu/duplication). Microsynteny analysis was performed using MicroSyn software, and the online OrthoMCL program (http://orthomcl.org/orthomcl/) was used to analyze duplicated genes.

Evaluation of K a/K s values

Ka (number of synonymous substitutions per synonymous site)/Ks (number of non-synonymous substitutions per non-synonymous site) ratios were evaluated using the DnaSP software. Sliding window analysis of Ka per nonsynonymous locus Ka/Ks ratio was also performed with a window size of 150 bp and a step size of 9 bp. Divergence time (T) was estimated by T = Ks/(2 × 9.1 × 10−9) × 10−6 million years ago (Mya).

Promoter and microarray analysis

The 2000 bp upstream sequences of the PtPCs promoter regions were downloaded from the Phytozome database and used to identify the putative cis-elements in PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Goodstein et al. 2011). To better understand the expression levels of PtPCs, the GSE13990 array data from poplar expression profiling was downloaded from the Gene Expression Omnibus (GEO) database at the NCBI (https://www.ncbi.nlm.nih.gov/). The corresponding probes for PC genes were identified using the online ProbeMatch tool available at the NetAffx Analysis Center (http://www.affymetrix.com/analysis/index.affx). The final gene expression data was identified using the corresponding PC-Probe and the GSE13990 data.

Plant materials and stress treatments

The 8-week-old seedlings of P. trichocarpa (Torr. & Gray) used in all experiments were cultivated in the Tissue Culture Lab. Plants were treated with 20% PEG (polyethyleneglycol) and 200 mM NaCl for drought and salt stress, respectively. Untreated plants were used as controls. Leaves were collected for RNA extraction at four time points (4, 8, 12 and 24 h) after treatment.

RNA isolation and qRT-PCR analysis after different stress treatments

TRIzol reagent was used to extract RNA from young poplar leaves under different stress treatments and in different organs, and first-strand cDNAs were synthesized. Primer Premier 5.0 and the NCBI primer BLAST tool were used to design and check primers for amplifying PtPC genes (Table S1). The poplar housekeeping gene encoding ubiquitin (UBQ, gene ID no. Potri.001G418500) was used as an internal control for normalizing experimental expression profile data (Hui et al. 2014). qRT-PCR was performed in a 20-μl volume, including 10 μl of 2 × SYBR® Premix Ex Taq™ (TaKaRa, Otsu, Japan), 0.4 μl of 50 × ROX Reference Dye, 2 μl diluted cDNA template, 0.8 μl of each specific primer, and 6 μl ddH2O. The qPCR reaction conditions were as follows: 95 °C for 30 s, followed by 40 thermal cycles of denaturation at 95 °C for 5 s, and annealing at 55–60 °C for 34 s. The relative expression levels were calculated using the ΔΔCT method. It is noteworthy that relative gene expression [2−ΔΔCT, CK (0 h)] for each gene in the control plants was normalized to 1 as described previously for stress treatments (Schmittgen and Livak 2008). GraphPad software was used for statistical analysis, and three biological and technical replicates were performed for each sample.

Results

Identification of PC family genes in poplar

We identified 77 potential PC protein sequences in poplar, of which three were paired with identical sequences, resulting in 74 putative PtPCs (Table 1). Based on multiple sequence alignments (Fig. 1), PtPCs were classified into three subfamilies: uclacyanin-like proteins (PtUCs, 7), stellacyanin-like proteins (PtSCs, 19) and plantacyanins (PtPLCs, 3) based on the predicted copper-binding ligands (His, Cys, His and Met/Gln). The remaining 45 PtPCs belonged to the ENODL family based on the modified copper-binding residues. Moreover, the prediction of subcellular localization indicated that 93.2% of PtPCs were associated with the plasma membrane or were extracellular (Table 1), and only five PCs (PtENODL9, PtENODL27, PtUC6, PtSC8 and PtSC18) were localized in the nucleus.

Table 1 List of PC genes identified in poplar and their sequence characteristics
Fig. 1
figure 1

Multiple sequence alignment of the amino acid sequences of the plastocyanin-like domains (PCLD) of PtPC proteins in Populus trichocarpa. Red, uclacyanins (UCs); blue, stellacyanins (SCs); cyan, plantacyanin (PLCs); purple, early nodulin-like proteins (ENODLs). The conversed amino acids involved in copper binding are highlighted by a green background (His, Cys, His, and Gln/Met), while the Cys residues involved in disulfide linkage are indicated by a yellow background

Phylogenetic and structural analysis of PtPCs

To better understand their structure and function, we predicted N-terminal signal peptides (SPs), glycosylphosphatidylinositol (GPI) anchor signals (GASs), AG glycomodules and N-glycosylation (Fig. S1; Table 1). The results revealed that 60 PtPCs had a predicted SP involved in targeting to the endoplasmic reticulum. Additionally, 49 PtPCs had GASs related to the localization at the plasma membrane. About 58.1% of poplar genes contained hypothetical AG glycomodules in the PAST-rich region (Pro, Ala, Ser, Thr). There were 60 PtPCs with putative N-glycosylation sites in the PCLD and PAST-rich region.

According to the predicted domain structures, PtPCs were divided into six types (Fig. 2): Type I includes a N-terminal SP, a PCLD, an arabinogalactan-like region (ALR), and a C-terminal GAS; by contrast, GAS is absent from type II; type III resembles type I but lacks ALR; type IV only has SP and PCLD; type V lacks SP compared with type III; type VI only contains PCLD.

Fig. 2
figure 2

Schematic representation of five groups of PtPCs. The diagram showing the features of PtPC domains was generated with MyDomains (http://prosite.expasy.org/cgi-bin/prosite/mydomains/). The figure is not drawn to scale

To investigate the evolutionary relationships of PC genes in Arabidopsis, Zea mays and poplar, we constructed a phylogenetic tree of AtPC, ZmPC and PtPC protein sequences. The 38 AtPCs, 60 ZmPCs and 74 PtPCs were divided into seven clades, consistent with previous Arabidopsis, rice and Phalaenopsis equestris studies (Fig. 3). The results indicated that clade VII had the most number of PC gene members (61), while clade VI contains the fewest PCs (12). All members of clade VI and clades VII entirely belong to the same subfamily, put it another way, the clade VI and the clades VII wholly contain SCs and ENODLs, respectively. Moreover, except for ZmUC18/26 and PtUC7, all PCs in the clade IV belonged to the ENODL subfamily.

Fig. 3
figure 3

Phylogenetic tree for Phytocyanins (PCs) from Populus trichocarpa, Arabidopsis, and Zea mays. The phylogenetic tree was constructed using PC protein sequences from the three plant species by the neighbour-joining method in MEGA6.0, with 1000 bootstrap replicates displayed at each node. PC proteins are divided into seven groups, and different subfamilies are indicated by different colours. Red, early nodulin-like proteins (ENODLs); green, stellacyanins (SCs); yellow, uclacyanins (UCs); cyan, plantacyanin (PLCs); purple, unknown

Gene structure and conserved motifs

It was well known that gene structural diversity resulted from the evolution of multi-gene families. The exon/intron structures of PtPCs are shown in Fig. S2. The gene structures of PtPCs are not complex and most include two introns. Like PtSC8, PtENODL1, 3, 34, 35, and 38, have only one intron, and eight PtPCs contain no introns. In addition to exon/intron pattern, conserved motifs could also be important for the diversified functions of PCs. We identified 10 different conserved motifs (Fig. 4), suggesting that members of the same subfamily shared a similar motif structure. The details of each conserved motif are shown in Table S2. Motif 1, motif 2, motif 6 and motif 10 are related to intermolecular electron transfer reactions, while others have no functional annotation. Most PtPCs include motif 1, 3, and 6, in the order 1-6-3. Motif 1 is the most common motif, present in all PtPC genes. Motif 7, 9 and 10 are only present in PtENODLs. PtUC1 differs from other PtUCs by lacking motif 6, and all PtSCs have motif 4 apart from PtSC7.

Fig. 4
figure 4

Schematic representation of the 10 conserved motifs in PtPC proteins. Motifs of the PtPC proteins were identified by MEME online tool. Each motif is represented by a differently coloured block, with their numbers in the centre of the motifs. The number in boxes (1–10) represents motif 1–motif 10, respectively. The position and length of each coloured box represents the actual motif size

Chromosomal location, gene duplication and conserved microsynteny

Based on the chromosomal location map, 74 PtPCs were randomly distributed on the 17 poplar chromosomes (Fig. 5). Chromosome 1 has the largest number of PC genes (14), while chromosomes 8 and 19 only have one PtPC gene, and chromosomes 4, 11, 13, 15, 16 and 17 only contain ENODL subfamily genes.

Fig. 5
figure 5

Chromosomal locations of the 74 predicted PtPC genes. The chromosome number is indicated above of each chromosome. Duplicated paralogous pairs of PtPC genes and segmental duplication genes are connected by black and red dashed lines in different colours

Many gene families in plants appear to be generated by expansion through segmental or tandem duplication. To better comprehend the evolution of poplar PC genes, we investigated genome duplication events in this family. Duplicated genes were confirmed using the Vista Synteny browser (http://pipeline.lbl.gov/cgi-bin/gateway2), which indicated that 11 pairs of genes originated from segmental duplication (Table S3). Neighbouring genes were also analyzed to determine whether tandem duplication had occurred (Ye et al. 2009), but there was no evidence of this phenomenon in PtPCs.

To examine the evolutionary selection process, we calculated Ka/Ks ratios of 10 pairs of PtPC paralogs (Table 2). Almost all Ka/Ks ratios were < 0.5, but the PtSC8/PtSC17 pair was > 1. Moreover, the duplication events of the 10 gene pairs were estimated to have occurred between 10.20 and 55.80 million years ago (Mashiguchi et al. 2004), based on sliding-window analysis of Ka/Ks ratios for all PC paralog pairs (Fig. S3).

Table 2 Ka, Ks and Ka/Ks values calculated for paralogous PC gene-pairs (Pt–Pt) in the Populus trichocarpa genome

To probe the relationships between homologous genes, we investigated the microsynteny between poplar and Arabidopsis sequences (Fig. S4). Pairwise comparison of flanking genes in chromosomal regions containing PC genes revealed three or more pairs displaying conserved microsynteny. Analysis of intraspecies microsynteny identified 23 collinear gene pairs in poplar (Fig. S5). Microsynteny analysis was also performed to evaluate the relationship between paralogous and orthologous PC genes, and 15 orthologous gene pairs with two-for-one microsynteny were identified between Populus and Arabidopsis sequences, but only six pairs were identified with one-for-one microsynteny, including AtENODL20-PtENODL5, and At1g22480-PtUC3/AtENODL8-PtENODL2.

Identification of cis-regulatory elements in the promoters of PtPCs

To investigate gene function and regulation, we analyzed cis-regulatory elements in the promoters of PtPCs (Table S4). In previous studies, we found that cis-regulatory elements play a pivotal role in controlling physical and reproductive growth, phytohormone responses, and abiotic and biotic stress responses. As shown in Fig. 6, endosperm expression elements (Skn-1_motif and GCN4_motifs) were found in 17 PtPCs and 6 PtPCs, respectively (Washida et al. 1999). The Skn-1_motif was the most abundant cis-element in the promoter sequences of 17 PtPCs, and the Skn-1_motif was the most populous in PtSC6. The CAT-box, RY-element and CCGTCC-box are present in two PtPCs, but MBSII, HD-Zip1 and HD-Zip2 are only present in one PtPC gene (Sessa et al. 1993; Bobb et al. 1997). We also identified the circadian control element and the O2 site, which were involved in zein metabolism regulation, in the promoters of PtPCs, and ten cis-elements related to phytohormone responses were detected (Anderson et al. 1994). The CGTCA and TGACG motifs that are related to MeJA responsiveness were found in seven PtPCs (Nejad et al. 2012). The TCA element, ABRE, ERE and TGA element were found in the promoters of 13, 8, 6 and 3 PtPCs, respectively (Goldsbrough et al. 1993; Shen and Ho 1995). Other elements, including the AuxRR core, TATC-box, GARE motif and P-box were also observed in 1, 2, 3 and 4 PtPCs, respectively (Washida et al. 1999). In addition, we also identified some cis-elements related to abiotic and biotic stresses, including HSE and TC-rich repeats in 14 PtPCs, but the WUN motif was only present in PtSC6.

Fig. 6
figure 6

Cis-acting elements in the promoter regions of poplar PC genes. a Number of each cis-acting element in the promoter region (2 kb upstream of the translation start site) of PtPC genes. b Statistics for all PtPC genes, including the corresponding cis-acting elements (red dots) and the total number of cis-acting elements in the PtPC gene family (black boxes) are given. Based on functional annotation, the cis-acting elements were classified into three major classes: plant growth and development, phytohormone responsive, or abiotic and biotic stresses-related (detailed results shown in Table S4)

Expression of PtPC genes

To reveal the different evolutionary fates of duplicated genes, we analyzed the expression patterns of PtPCs in six different tissues of poplar: young leaves, roots, xylem, female catkin, male catkin, and mature leaves (Fig. S6; Table S5). The heatmap showed that 11 PtPCs (PtPLC1, PtENODL43, PtENODL7, PtSC6, PtUC2, PtENODL45, P9tSC, PtENODL41, PtSC10, PtENODL25, PtUC1) were highly expressed in six organs. Notably, PtPLC1 and PtUC4 have relatively high expression levels only in roots and xylem, respectively.

Examination of PC gene expression by qRT-PCR

To confirm the expression of PtPCs in response to stresses, we used qRT-PCR to analyze relative expressions of 18 PtPCs under drought and salt treatments (Figs. 7, 8). The results of drought treatments showed that three PtPCs (PtSC9, PtENODL24 and PtENODL41) were highly expressed at all four-time points measured, and some PtPCs were up-regulated at particular time points. For instance, PtPLC1 was only up-regulated at 4 h after treatment. PtUC7, PtSC13 and PtSC17 were distinctly up-regulated at 24 h. By contrast, expression of PtENODL9 remained low at all time points, while PtENODL45 was observably down-regulated at 24 h after treatment.

Fig. 7
figure 7

Expression profiles of PtPC genes in response to drought treatment as determined by qRT-PCR. A heat-map shows the hierarchical clustering of the relative expression of 74 PtPC genes under drought treatment. Blue indicates lower and red represents higher transcript abundance compared to the relevant control. The leaves were sprayed with 20% PEG-6000 and sampled after 4, 8, 12 and 24 h of treatment. Relative expression levels of 74 PC genes were examined by qRT-PCR and normalized with respect to the reference gene UBQ (Potri.001G418500) under drought stress treatment

Fig. 8
figure 8

Expression profiles of PtPC genes under NaCl treatment as determined by qRT-PCR. A heat-map shows the hierarchical clustering of the relative expression of 74 PtPC genes under NaCl treatment. Blue indicates lower and red represents higher transcript abundance compared to the relevant control. Salt-stress was carried out by watering the plants with a 200 mM solution of NaCl. Plants were sampled after 4, 8, 12 and 24 h of treatment. Relative expression levels of 74 PC genes were examined by qRT-PCR and normalized with respect to the reference gene UBQ (Potri.001G418500) under NaCl treatment

Subsequently, we analyzed the expression patterns of PtPCs under salt treatments. The expression levels of PtEDNOL9 remained low at all time points, but PtPLC1 was down-regulated at 4 h yet up-regulated at other time points. Expression of PtENODL24 and PtENODL41 obviously showed a high expression under both, drought and salt treatment, at all time points, and expression of three PtPCs (PtUC7, PtSC13 and PtSC17) was significantly increased at 24 h after salt treatment. Additionally, PtUC2 was distinctly up-regulated at all time points except at 12 h. PtUC7, PtSC13 and PtSC17 had the similar expression after both drought and salt treatments at all time points.

To predict possible functions of PtPC genes in organ development, we also performed qRT-PCR analyses to examine the relative expression of the 18 PtPCs in five organs, including young leaves (YL), roots (RT), xylem (XY), mature leaves (ML), and phloem (PH) (Fig. 9; Table S6). As shown in Fig. 9, PtPCs showed the highest mRNA accumulation in all the tissues, including three in the roots (PtUC1, PtENODL45, PtENODL41) and young leaves (PtENODL24, PtENODL9, PtSC6), four in the mature leaves (PtENODL24, PtSC17, PtENODL7, PtUC7) and the phloem (PtENODL9, PtENODL25, PtENODL33, PtENODL43), five in the xylem (PtUC2, PtUC4, PtSC10, PtSC13, PtPLC1). However, most of the remaining genes had intensively different expression patterns. For example, PtENODL45 had low expression levels in mature leaves. It is worth noting that PtSC17 and PtENODL24 were both up-regulated in the mature leaves while they were expressed at a relatively low level in young leaves and xylem, respectively.

Fig. 9
figure 9

The qRT-PCR analysis of expression profiles. A heat-map shows the hierarchical clustering of the relative expression of 18 PtPC genes across the five different tissues analyzed. The vertical colour scale at the right of the image represents log2 expression values: red indicates a high level and blue represents a low level of transcript abundance. Relative expression levels of all PtPCs were examined by qRT-PCR and normalized with respect to the reference gene UBQ (Potri.001G418500) in different tissues. RT roots, YL young leaves, ML mature leaves, XY xylem, PH phloem

Discussion

Genome-wide analysis of PC genes in plants has been performed previously in various species including Arabidopsis thaliana (Mashiguchi et al. 2009), Oryza sativa (Ma et al. 2011), Brassica rapa (Li et al. 2013), Zea mays (Cao et al. 2015) and Phalaenopsis equestris (Xu et al. 2017). All of these studies indicated a role in plant growth and development (Fedorova et al. 2002; Ozturk et al. 2002; Diab et al. 2004; Ma and Jie 2010; Wu et al. 2011), but PC genes have not been investigated extensively in poplar. In the present study, we identified 74 PtPCs and grouped them into seven clades based on phylogenetic tree analysis. The phylogenetic tree showed that most genes of each subfamily in poplar grouped tightly with these genes of subfamily in Arabidopsis and Zea mays, which is consistent with the fact that maize, Arabidopsis and poplar diverged from a common ancestor before the divergence of monocot and dicot lineages. AGPs belong to a subfamily of hydroxyproline-rich glycoproteins (HRGPs), and are involved in the growth and development of the plant, such as stem strength, somatic embryogenesis in cotton, cell culture and extracellular signals transduction (Tan et al. 2004; Seifert and Roberts 2007; MacMillan et al. 2010; Poon et al. 2013; Ma et al. 2017). Several bioinformatics studies were used to identify the AGP gene family in plants. For instance, the PAST amino acid bias was calculated for AGPs in Arabidopsis as well rice (Schultz et al. 2002; Ma and Zhao 2010), the well-designed BIO OHIO program was employed for HRGPs in Arabidopsis and poplar (Showalter et al. 2010, 2016), a Python script named Finding-AGP was utilized in 47 plant specie (Ma et al. 2017), MAAB bioinformatics pipeline was conducted to classify HRGPs (Johnson et al. 2017), and BLAST searches were performed for AtENODLs, OsPCs and PePCs (Mashiguchi et al. 2009; Ma et al. 2011; Xu et al. 2017). Through these methods, the classical AGPs, lysine-rich AGPs, AG peptides, fasciclin-like AGPs, plastocyanin AGPs and other chimeric AGPs have been identified in Arabidopsis, rice and poplar. Chimeric AGPs were determined if the protein sequences contained at least one arabinogalactosylated domain and a domain with an unrelated motif (Schultz et al. 2002). In our study, 43 PtPCs contain AG glycomodules and SPs, including 27 PtENODLs, five PtUCs, nine PtSCs and two PtPLCs that might belong to the chimeric AGP group. Showalter et al. (2016) have identified 39 PtPAGs in poplar by BIO OHIO 2.0 bioinformatics program, and all PtPAGs belonged to the AGPs we identified in the present study except two PtPAGs (PtPAG25 and PtPAG27). Moreover, eight additional PtPCs (PtENODL2/7/8/12/19/21/25/31) might be the member of a AGP superfamily due to the presence of putative arabinogalactan glycomodules found in our study. The previous report has identified 18 ENOD-like, seven UC-like and four SC-like AGPs in Arabidopsis (Li et al. 2013). There were about 1.5 times more ENOD-like, 0.71 times more UC-like and 2.25 more SC-like AGPs in poplar than in Arabidopsis, which may result from the evolutionary diversification of the two plants.

Multiple sequence alignment confirmed that Cys residues involved in the formation of disulfide linkages are highly conserved in all 74 PtPCs, suggesting they are essential for maintaining PCLD structure, function and stability. PCs are involved in electron transport in the cytomembrane. This process is related to photophosphorylation and can affect ATP generation, and therefore influences physiological processes in plants.

Several reports on subcellular localization of PCs have provided information on their specific functions in plants. For instance, AtSC3 is located in the plasma membrane and related to aluminum as well oxidation stresses (Ezaki et al. 2001, 2005), which is possibly the response to other abiotic stresses. Khan et al. (2007) found that AtENODL9-1, one of the double mutants of AtENODL9, which is located in the sieve element plasma membrane, plays a more significant role in reproductive processes than irregular physiological processes, as revealed by analysis of the phenotype of homozygous T-DNA insertion mutants. All known PCs have secretion signals, which are essential to localize in the extracellular space. According to reports, more than half of ENODLs were predicted to be GPI-anchored proteins, implying theirs functions at the plasma membrane or extracellular matrix. Many proteins exist in the cell in a dynamic form, and they can perform diverse cell functions in different subcellulars. In addition, the plants often grow under a variety of environmental conditions, which may make some proteins, such as transcription factors, appear in different organelles during signal transduction. Our current study revealed that most PtPCs were localized in the plasma membrane or extracellular, and the phenomenon demonstrated that the different subcellular localization of phytocyanins may be due to diverse function in poplar. PtPCs may exert a significant part in the growth and the development of poplar, and they were divided into four subfamilies.

The exon/intron structure and motif arrangement are highly conserved in all subfamilies. The 74 identified PtPCs include different numbers of introns, indicating diverse functions during their evolution. The similar exon/intron and motif composition of members within subfamilies imply conservation of function, and the phylogenetic analysis can, therefore, be beneficial. The specific motifs are ordered 1-6-3 in all sequences, indicating strong conservation of domain structure. Motifs 7, 9 and 10 are only present in members of the ENODL subfamily, indicating a role in the functional divergence of PtENODLs. Notably, all PLCs lack motif 7 in poplar, demonstrating divergence from other subfamilies during evolution and its probable specific-role in plant development and growth, which could be important to their functions during electron transportation. The identified subfamily has different motifs which may play a crucial role in the different functions that their genes perform. These specific motifs present in the PtPCs are most probably the structural basis for their diverse functions.

The endosperm offers diverse nutrients to seedling growth after germination before it turns into photosynthetic and self-sufficiency (Zhang et al. 2016). Most PtPCs had Skn-1_motif cis-regulatory elements were relevant to endosperm expression. Analysis of promoter regions showed that the majority of PtPCs contained TC-rich repeats cis-regulatory elements, which suggested that PtPC genes also played a significant role in stress responses. In Boea crassifolia, BcBCP1, an ENOD-like gene, increases tolerance to osmotic stress in transgenic tobacco under the control of the CaMV 35S promoter. Ezaki et al. (2005) found that the Arabidopsis blue-copper binding gene restrained aluminum absorption so as to keep plants and Saccharomyces cerevisiae (yeast) out of aluminum toxicity. In the present study, PtENODL24 containing the MYB cis-element was strongly induced under drought treatment, which was supported by the promoter cis-element analysis.

Gene duplication is a vital source of new genes in the course of evolution (Lynch and Conery 2000; Gu et al. 2003; Khan et al. 2007). Gene duplication can also help organisms adapt to different environments during development and growth (Bowers et al. 2003). Gene duplication mechanisms such as unequal exchange, reverse transcription, or entire gene duplication produce a gene or base sequence that resembles the original gene (Zhang 2003). However, previous studies indicated that half of OsPCs in rice were the result of segmental duplication, and the others derived from tandem duplication, which indicated that these two types of duplication event played an equivalent role in the expansion of OsPC genes (Ma et al. 2011). The results of the present study indicated that the conserved region of 10 PtPC paralogs resulted from segmental duplication events, suggesting that segmental duplication exerted a crucial part in the expansion of the PtPC gene family.

Ka/Ks ratio can be used to measure the historical choice of coding sequences (Vandepoele et al. 2003; Wu et al. 2016). In this study, Ka/Ks sliding window analysis indicated that most PtPCs were under disadvantageous selection, imply a strong selection constraint and purifying selection in the PtPC genes. This suggests that functional divergence of duplicated genes might have been promoted by positive selection during evolution, which may facilitate the adaptation of plants in different environments. By evaluating the duplication time of paralogs, we inferred that all large-scale duplication events involving PtPCs occurred within the last 10.20–55.80 million years. We observed microsynteny in PtPCs and identified 23 collinear PC gene pairs, indicating a low degree of divergence during evolution. Simultaneously, we observed microsynteny between Populus and Arabidopsis PC genes and identified 15 pairs orthologs, hence orthologs derived from common ancestral genes are present in different species.

Since expression analysis can provide valuable information for further exploration of the relative expression levels and better understand the PC gene function in P. trichocarpa, we examined the expression of PtPCs in poplar in different tissues using microarray data. The results demonstrated that most genes were highly expressed in roots and xylem, indicating roles in vegetative growth. Through comparisons of microarray analysis and qRT-PCR analysis, we found that the majority of PtPCs was similarly expressed in the same organs. However, there were some different results. For example, PtENODL41 was expressed at low levels in roots according to microarray analysis but relatively high in the same tissue according to qRT-PCR, which might result from differences in the experiments, such as conditions, poplar ages, sample collection times and so on. In addition, abiotic stresses such as high salinity and high temperature influence plant growth and development. Many stress-response genes are activated to help plants to deal with stress in adverse circumstances. Hence, it is important to identify the major regulatory pathways of stress responses in poplar. The results of qRT-PCR experiments indicated an important function for PtPCs under drought or salt stresses. Previous reports showed that OsUC23/26/27 and BrUC6/16 were highly expressed under drought or salt stresses. Differences of gene expression among species indicated that PC genes in different plants may generate diverse responses to abiotic stress.

Conclusions

In the current study, we identified 74 PtPCs and comprehensively and systematically analyzed their PC domains, gene structure, gene replication, chromosomal distribution, and conserved microsynteny. The evolutionary relationships between PCs in Arabidopsis and poplar were also investigated, and the results revealed different expression patterns, indicating differentiated functions. To further understand the function and role of each PC family gene, multiple methods including molecular genetic analysis should be employed. The qRT-PCR analysis indicated a central role of PtPCs in many aspects of plant growth and development and plant stress response. Our findings provide a theoretical basis for further research on the function of PtPCs.

Author contribution statement

SSL and WFH designed and conceived the experiment, carried out the principal bioinformatics analysis, drafted the manuscript. Performed the experiments: SSL, WFH. Edited the data, figures and tables: YW, BL. Contributed reagents/materials/analysis tools: YX. All authors read and approved the final manuscript.