Introduction

MYB (myeloblastosis) transcription factors (TFs) comprise a large and functionally diverse group of proteins found in all eukaryotic organisms (Dubos et al. 2010). As one of the largest TF families in the plant kingdom, MYB TFs play important roles in regulating physiological and biochemical processes (Du et al. 2012; Su et al. 2015; Liao et al. 2016). MYB proteins are characterized by a highly conserved DNA-binding domain (DBD) located at the N-terminus that consists of 1–4 imperfect tandem repeats, (R0, R1, R2, and R3), of approximately 52 amino acids, each of which forms three α-helices. The MYB repeat contains regularly spaced tryptophan residues that form a hydrophobic core and stabilize the 3D structure of the protein. MYB TFs have been divided into four groups based on the number of repeated MYB domains: 4R-MYB, 3R-MYB (R1R2R3-MYB), 2R-MYB (R2R3-MYB), and 1R-MYB or MYB-related, containing four, three, two and single or partial repeats, respectively (Rosinski and Atchley 1998; Dubos et al. 2010). Among these groups, R2R3-MYBs are the most common TFs in the family and appear to be specific to plants and to yeast (Jin and Martin 1999).

The first gene described as containing a MYB domain was the v-myb oncogene, which is derived from the avian myeloblastosis virus (Klempnauer et al. 1982). Among plants, the first cloned MYB gene, C1, was identified in maize (Zea mays). This gene encodes a c-myb-like TF involved in the regulation of anthocyanin biosynthesis (Paz-Ares et al. 1987). Since then, an increasing number of MYB family members have been identified and characterized in a variety of vascular plants. There are currently 22,032 MYB and 15,369 MYB-related sequences available in the Plant Transcription Factor Database (http://planttfdb.cbi.pku.edu.cn/) (Jin et al. 2017). Specifically, the diverse functions of a variety of plant MYBs have been investigated through both genetic and molecular analyses, confirming that they are widely involved in regulating hormone signaling, primary and secondary metabolism, and cellular and developmental processes as well as plant responses to biotic and abiotic stresses (Jin and Martin 1999; Dubos et al. 2010; Dai et al. 2012).

Recent studies have shown that plant MYB TFs are involved in regulating plant homeostasis under various environmental stresses. For example, AtMYB2 expression is induced by dehydration and salt stress (Abe et al. 2003), and AtMYB15 plays a role in cold stress and improves drought and salt tolerance in Arabidopsis (Agarwal et al. 2006; Ding et al. 2009). In addition, OsMYB2 is involved in rice tolerance to salt, cold and dehydration stresses (Yang et al. 2012), and AmMYB1 from Avicennia marina enhances tolerance to NaCl stress in transgenic tobacco (Ganesan et al. 2012). Moreover, several reports have indicated that MYB family members are associated with tolerance to phosphorus deficiency. AtMYB62 appears to be involved in phosphate (Pi) starvation (Devaiah et al. 2009), and canola BnPHR1, encoding a MYB-like protein, is regulated by exogenous Pi and promotes uptake and homeostasis of Pi for plant growth (Ren et al. 2012). Furthermore, Wang et al. (2013) investigated the ability of Ta-PHR1, a wheat MYB gene that is transcriptionally activated under Pi deficient conditions, to enhance the efficiency of phosphorus use and yield performance under Pi starvation.

Although phosphorus (P) is an indispensable macronutrient that plays crucial roles in plant development and metabolism, the amount of P available for plant uptake is often limited due to high fixation and slow diffusion in soil (Shen et al. 2011). Masson pine (Pinus massoniana Lamb.), a conifer native to southern China’s tropical and subtropical regions, is severely stricken by phosphorous deficiency (Zhang et al. 2010). In a previous transcriptomic profiling analysis, there was increased expression of several MYB genes in P. massoniana seedlings subjected to phosphorous deficiency (Fan et al. 2014). Nevertheless, the identities and roles of MYB family genes in P. massoniana remain largely unknown. In the present study, whole-transcriptome-wide identification of MYB TFs in P. massoniana were performed. Fifty-nine MYB family members were identified by searching for protein motifs and conserved domains. Phylogenetic analysis was conducted by comparing the retrieved P. massoniana MYB sequences with those identified in Arabidopsis, and expression profiles of these 59 MYB genes in response to different Pi deficient conditions were analyzed by microarray. As MYB TFs are involved in diverse biological processes, our findings should provide a solid foundation for future research into the functional roles of P. massoniana MYB (PmMYB) genes.

Materials and methods

Identification of MYB TFs

Putative MYB TF sequences were retrieved from annotated protein sequences based on P. massoniana transcriptomic data (Fan et al. 2014). Arabidopsis MYB genes were obtained from The Arabidopsis Information Resource (TAIR) (Katiyar et al. 2012), and used as query sequences to ensure that no additional related genes were selected. The identified PmMYBs were rechecked and confirmed to avoid repetition through multiple alignment, and the resulting sequences were subjected to open reading frame (ORF) prediction using OrfPredictor (Muthamilarasan et al. 2015). The peptide sequences were employed for the identification of MYB DBDs using the methodology of Plant Transcription Factor Database (Jin et al. 2017). Amino acid motifs present in the deduced PmMYBs were analyzed using the MEME tool (version 4.11.3) (http://meme-suit.org/tools/meme), and a schematic diagram of these motifs was drawn. The following parameter settings were used: distribution of motifs, zero or one per sequence; maximum number of motifs to find, 10; minimum width of motif, 6; maximum width of motif, 250. For other options, the default values were employed. Only motifs with an e-value of < 1e−20 were retained for further analysis. Structural verification of the MYB domains was performed using Simple Modular Architecture Research Tool (SMART) (Letunic et al. 2012).

Multiple sequence alignment, phylogenetic analysis and digital expression profiles

Multiple sequence alignments of the MYB TFs were performed between Arabidopsis and P. massoniana using ClustalW with default parameters. Phylogenetic trees were generated based on the multiple sequence alignment using MEGA 5.2, and the neighbor-joining (NJ) method was adopted, with 1000 bootstrap replicates. Additionally, the biological functions of some PmMYBs were predicted based on the phylogenetic tree according to orthology.

Custom-designed 8 × 60-K DNA microarray chips (Agilent, Beijng, China) were constructed and used to estimate MYB gene expression based on previous transcriptomic data (Fan et al. 2014). Fluorescent dye (Cy3-dCTP) labeling of P. massoniana cDNA was performed as previously described using a cDNA Amplification and Labeling Kit (CapitalBio Corporation, Beijing, China) (Fan et al. 2014). The ratios of fluorescent signal intensities of each DNA element were measured to determine changes in gene expression. Procedures for microarray hybridization, washing, scanning and data analysis were performed at the Biochip National Engineering Research Center of Beijing, CapitalBio Corporation. Analysis was performed using Feature Extraction and Gene Spring GX software with robust multi- array average (RMA) normalization. Expression clusters of the PmMYB genes under different conditions were analyzed using Cluster, and a diagram was drawn using Tree View.

Plant materials and stress treatments

The plant materials and stress treatments were similar to those employed in a previous study (Fan et al. 2014). Masson pine seeds were surface-sterilized with 5% NaOCl for 5 min, followed by 70% ethanol for 10 min, rinsed in sterile water and soaked overnight at 30 °C. Seeds germinated after 7 days.

Three treatments involving different phosphorous levels were applied: (1) a control treatment: 5.0 mM KNO3, 4.5 mM Ca(NO3)2·4H2O, 2.0 mM MgSO4·7H2O, 0.5 mM KH2PO4, 46 µM H3BO3, 10 µM MnCl2·4H2O, 0.8 µM ZnSO4·7H2O, 0.56 µM CuSO4·5H2O, 0.4 µM H2MoO4·4H2O, and 25 µM Fe-NaEDTA; (2) P1—same as the control treatment except that 0.06 mM KH2PO4 was provided; and (3) P2—same as the control treatment except that 0.01 mM KH2PO4 was provided. The low concentration of potassium due to a reduction in KH2PO4 concentration was corrected with the addition of KCl. 10 days after emergence (DAE), the nutrient solutions with different P concentrations were added every 2 days. Seedlings were harvested at 58 DAE, immediately frozen in liquid nitrogen and stored at − 80 °C. All treatments were replicated in three pots.

RNA extraction and quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR) validation

Total RNA from collected samples was extracted using an Invitrogen Plant RNA Isolation Kit according to the manufacturer’s protocols. First-strand cDNA was synthesized using an RNA LA PCR Kit (TaKaRa, Dalian, China) and the supplied oligo dT-adaptor primer. To verify the digital expression profiles obtained, six MYBs were randomly selected and quantitatively analyzed via qRT-PCR. Primer Premier 5.0 was used to design the primers for each PmMYB gene (Table S1). The samples and standards were examined in triplicate in each plate using a SYBR Select Master Mix Kit (Applied Biosystems, Carlsbad, USA) and a 7500 Fast Real-Time PCR System (Applied Biosystems) according to the manufacturer’s instructions. The relative transcript levels for each gene were calculated as the comparative cycle threshold (Ct), which was normalized to the geometric mean of the Ct value of two internal control genes (18s-rRNA; UBQ). Expression ratios were calculated via the \(2^{{ - {\Delta \Delta }C_{\text{t}} }}\) method with correction for the PCR efficiency of each gene.

Results

Transcriptome-wide identification of MYB TFs in P. massoniana

To identify MYB-related proteins from the previously reported full-transcriptomic data for Masson pine (Fan et al. 2014), BLASTP searches were performed using consensus motifs for the MYB domain as queries (Altschul et al. 1997). From among 70,896 unigenes, 73 MYB and MYB-related candidate protein sequences were filtered; redundant sequences of candidate MYBs were discarded from the dataset according to similarity. Structural and functional verification of MYB domains was also performed using ScanProsite and SMART tools. The sequences that included an apparent MYB-type DBD were confirmed to be MYB TF members. Fifty-nine non-redundant PmMYB transcription factors were ultimately identified, named and analyzed (Table S2). A total of 10 motifs were identified to illustrate the P. massoniana MYB protein structure using the MEME program. The positions of MYB/SANT domains and any conserved motifs are shown in Fig. 1 and Fig. S1. The number of motifs in PmMYBs ranged from 2 to 6, with lengths ranging from 8 to 223 amino acids. Most of the PmMYBs contained motifs 1 and 2, and most of the close members in the phylogenetic tree exhibited similar motif compositions, suggesting that proteins in the same subgroup may have similar functions (Fig. 1). Additionally, 39 typical R2R3-MYB TFs, three 3R-MYB proteins (PmMYB1, 22, 46) and one 4R-related MYB protein (PmMYB50) were found among these confirmed PmMYBs.

Fig. 1
figure 1

Phylogenetic tree (left) and distribution of the conserved motifs (right) of 59 PmMYB proteins. The phylogenetic tree constructed using MEGA 5.2. MEME was employed to predict the motifs

Structural analysis of MYBs in P. massoniana

As shown in Fig. 1, motif 2 was found in all 59 MYB proteins. SMART analysis revealed four motifs (motif 1, 2, 7 and 10) conserved in the MYB DBD; motif 3 was an HMG 17 domain; motif 5 an S 15 domain; motif 8 a coiled coil region; motif 9 a TFIIA domain; and motifs 4, 5, and 6 domains of unknown function.

Multiple amino acid sequence alignments using ClustalX2 showed that the 59 P. massoniana MYBs could be separated into four groups based on sequence similarity (Fig. S2). Further sequence alignment revealed that the 45 proteins of group 1 share the R2 and R3 MYB repeats and contain characteristic amino acids, including a series of highly conserved tryptophan (Trp, W) residues (Fig. 2a) in each DBD repeat. The five proteins of group 2 share a MYB-like domain containing a SHAQKYF amino acid signature motif (Fig. 2b). Sequence alignment of group 3 proteins indicated that the four MYBs contain a MYB-CC domain (Fig. 2c) located at the C-terminus (Ruan et al. 2017). The other five members of group 4 harbor non-conserved domains, except for motif 2.

Fig. 2
figure 2

Highly conserved amino acid residues are present among the R2R3 (a), 1R (b) and MYB-CC (c) domains of the PmMYBs. The regularly spaced Trp residues present in each repeat are labeled with asterisks. The shading of the alignment indicates identical residues in black, conserved residues in dark gray, and similar residues in light gray

Phylogenetic analysis and functional prediction of the PmMYBs

To evaluate the evolutionary relationships and functions of the PmMYBs, comparative phylogenetic analysis was conducted with the identified P. massoniana MYB TFs and 127 Arabidopsis MYB TFs using the neighbor-joining method. The MYBs were divided into 19 groups, (designated Group 1–Group 19), according to the alignment based on the sequence similarity and topology of the encoded proteins (Fig. 3). The 59 PmMYBs were dispersed among 15 MYB groups. Groups 2, 6, 11 and 16 comprise only Arabidopsis MYBs. Furthermore, the number of homologs from the two species was highly asymmetrical. For example, one PmMYB and 12 AtMYBs were included in Group 1, and 15 PmMYBs and eight AtMYBs were included in Group 8. The results of phylogenetic comparative analysis revealed considerable conservation and diversification of the MYB gene family in these two plant species.

Fig. 3
figure 3

Comparative phylogenetic analysis of P. massoniana MYB proteins with those from Arabidopsis. The proteins were distributed into 19 groups

Homologous genes are widely distributed in different species and are generally assumed to perform equivalent or similar biological functions (Cai et al. 2012). The functions of most Arabidopsis MYB TFs have been well characterized experimentally, and phylogenetic analysis has identified several functional groups (Peng et al. 2016). Accordingly, PmMYBs were clustered into different Arabidopsis functional groups, including meristem formation, cell wall biogenesis, stress response, anthocyanin biosynthesis, cell differentiation and stomata development (Fig. 3). The results were consistent with the Gene Ontology annotation.

Expression profiles of identified PmMYB genes under different Pi-deficient conditions

To analyze PmMYB gene expression profiles under different Pi-deficient conditions, P. massoniana transcriptome microarray analyses were performed. Genes were considered differentially expressed if they were up-regulated by > twofold or down-regulated by < 0.5-fold (P < 0.05); the data are presented with clustering using fold-change values and displayed in a heat map. The results reveal altered expression numbers and levels in association with the degree of Pi deficiency. Fourteen PmMYB genes were differentially expressed under both P1 and P2 stress, and 10 others were differentially expressed exclusively under the P2 treatment (Fig. 4 and Table S3), suggesting differential sensitivity of stress-induced genes to the stress severity. The expression pattern of PmMYBs showed that some that clustered in the same group or subgroup of the phylogenetic tree (Fig. 3) and exhibited similar expression changes. For example, the expression level was increased for most PmMYBs in Group 4 and a subgroup (including PmMYB51, -52, -53) of Group 8, and functional prediction revealed that these genes play an important role in stress response. However, some PmMYBs within Group 7 showed completely different expression patterns, even though they were predicted to have similar functions.

Fig. 4
figure 4

Heat map clustering of PmMYB gene expression levels under different phosphorus deficiency conditions. The microarray values were normalized via log2 transformation to represent color scores. Red represents up-regulation. Green represents down-regulation

To verify the observed levels of expression, six MYB genes (Table S1) were randomly selected and quantitatively analyzed by qRT-PCR. As the results were consistent (Fig. 5), the data obtained for expression analysis are reliable.

Fig. 5
figure 5

Expression profiles of six selected MYB TFs determined via microarray and qRT-PCR analysis. The P1 and P2 conditions are described in “Materials and methods” section

Discussion

MYB proteins regulate multiple biological processes and constitute the most abundant transcription factor (TF) family in plants. MYB proteins can be divided into different classes depending on the number of adjacent repeats, including 1R-MYB or MYB-related R2R3-MYB, 3R-MYB and 4R-MYB groups (Dubos et al. 2010). The 1R-MYB or MYB-related group is a heterogeneous class comprising proteins with a single or a partial MYB repeat. These proteins are involved in the control of cellular morphogenesis (Simon et al. 2007), secondary metabolism (Dubos et al. 2008), organ morphogenesis (Kerstetter et al. 2001), chloroplast development (Waters et al. 2009) and the response to Pi starvation (Rubio et al. 2001). In this study, four putative PHRs (PmMYB4, PmMYB19, PmMYB21 and PmMYB28), which are conserved MYB TFs involved in Pi starvation signaling, were identified in the P. massoniana transcriptome. Most plant MYB genes encode proteins of the R2R3-MYB class, which are thought to have evolved from a 3R-MYB gene ancestor (Rosinski and Atchley 1998), although the number of R2R3-MYB proteins varies among plant species. For instance, there are 126 R2R3-MYB proteins in Arabidopsis (Dubos et al. 2010), 109 in rice (Katiyar et al. 2012), 192 in Populus (Wilkins et al. 2009), and 108 in grape (Matus et al. 2008). In our study, 39 putative R2R3-MYB proteins were identified from P. massoniana transcriptomic data, and three typical 3R-MYB proteins and one 4R-related MYB protein. Genes encoding 3R-MYB proteins, which regulate progression through cell cycle transitions, are considered an ancient and evolutionarily conserved family (Peng et al. 2016). Conversely, little is known about the role of 4R-MYB proteins in plants.

In the present study, 10 motifs were predicted by MEME, four of which are MYB/SANT domains according to the SMART analysis. However, these motifs did not exhibit the basic features of whole MYB DBD domains, indicating that there are fewer well-conserved DBD domain sequences among the diverse PmMYB proteins. Most of these PmMYB proteins share the R2 and R3 MYB repeats and contain characteristic amino acids, including a series of highly conserved W residues (Fig. 2a). These conserved W residues are known to play key roles in sequence-specific DNA binding and to serve as landmarks of plant MYB proteins (Ogata et al. 1992; Wang et al. 2015), including a series of highly conserved W residues (Fig. 2a) in each DBD repeat. The proteins of group 2 share a MYB domain similar to the second MYB repeat in tomato LeMYB, which acts as a transcriptional activator in yeast and plants (Rose et al. 1999; Wang et al. 2015), and contains a SHAQKYF amino acid signature motif in the MYB-like repeat (Fig. 2b). Group 3 proteins exhibit a MYB-CC domain (Fig. 2c) located at the C-terminal end of the protein (Ruan et al. 2017).

Considering all MYB-related proteins, it is clear that a wide diversity of functions exists. As the biological functions of proteins are correlated with the primary structures within subgroups (Dubos et al. 2010), it is useful to predict gene function by identifying orthologs between plants based on evolutionary relationships. Arabidopsis MYB TFs have been well characterized experimentally and in recent years have been employed to predict the functional annotations of MYB TFs in other plants, such as barley (Tombuloglu et al. 2013), sugar beet (Stracke et al. 2014), pear (Feng et al. 2015) and Jatropha curcas L. (Peng et al. 2016). To deduce the biological functions of P. massoniana MYBs, comparative phylogenetic analysis was conducted with Arabidopsis MYB TFs. Referring to the functional categories for Arabidopsis (Dubos et al. 2010), all PmMYB proteins clustered into Arabidopsis functional groups (Fig. 2). A small number of three MYB repeat-containing plant proteins, known as MYB3R or R1R2R3-MYB, are related to the regulation of mitosis (Ito 2005). Kobayashi et al. (2015) found that AtMYB3R TFs are important for repressing the expression of specific genes during the cell cycle and establishing a post-mitotic quiescent state during organ size determination. Three of the confirmed P. massoniana 3R-MYB proteins clustered into the same subgroup with AtMYB3R TFs, which suggests a similar function in regulating growth in P. massoniana. Gene duplication is considered to be the primary driving force underlying new gene functions (Liao et al. 2016), and indeed, the expansion of the MYB family in plants suggests that MYB genes may play a variety of roles in plant-specific processes. The R2R3-MYB family is large, and its functions in plant development and environmental responses are diverse (Jin and Martin 1999). In the present study, many P. massoniana R2R3-MYBs clustered into Group 4, Group 7 and Group 8, members of which have been shown to be involved in stress responses in Arabidopsis, and this facilitated the identification of PmMYB genes that may play a role in the response to stress conditions.

Previous studies in plants have revealed that numerous MYB family members are widely involved in responding to diverse abiotic stresses and are differentially expressed under drought, salt and other stress conditions. In Arabidopsis, AtMYB2, AtMYB74 and AtMYB102 are up-regulated by drought stress (Urao et al. 1993; Abe et al. 2003; Denekamp and Smeekens 2003). In addition, AtMYB96 contributes to drought resistance by regulating lateral root meristem activation or transcriptional activation of cuticular wax biosynthesis (Seo and Park 2009; Seo et al. 2011). Kim et al. (2013) reported that AtMYB73 is a negative regulator of salt-sensitive induction in response to salt stress; AtMYB73 is strongly induced by salt stress but not by other stresses. Lotkowska et al. (2015) recently reported that AtMYB112, a formerly unknown regulator, promotes anthocyanin accumulation under salt stress and conditions of high-light stress. In addition, AtMYB62 expression has been reported to be induced in response to Pi starvation (Devaiah et al. 2009), and AtMYB2 regulates the plant response to Pi deficiency by regulating expression of the miR399 gene (Baek et al. 2013). All of these findings confirm that MYB genes are involved in plant responses to abiotic stress. A gene expression pattern similar to a closely grouping paralog suggests functional similarity (Tombuloglu et al. 2013). Based on microarray analyses, we observed 24 P. massoniana MYB genes to be differentially expressed under Pi deficiency stress, including 17 up-regulated genes and seven down-regulated genes, and the similarity in expression of these PmMYB genes was found to be related to their phylogeny (Figs. 2, 3). These results suggest that these PmMYB genes are transcriptionally regulated by Pi starvation signaling. In addition, examination of the expression patterns of PmMYB TFs showed some MYBs to be differentially expressed only under severe Pi starvation conditions, suggesting differential sensitivity of stress-induced genes to stress severity. Interestingly, four putative PHRs (PmMYB4, -19, -21 and -28), which are conserved MYB TFs, are involved in Pi starvation signaling (Mehra et al. 2017; Pandey et al. 2017), showed no significant differences in expression under either the P1 or P2 conditions. Rubio et al. (2001) reported that PHR1, which acts downstream in the Pi starvation signaling pathway, is expressed under Pi-sufficient conditions and only weakly responsive to Pi starvation.

These results provide insight into the structural and functional framework of P. massoniana MYB proteins and may be used as a reference for future functional investigations.