Introduction

The traditional structure-function paradigm states protein function depends on a well-defined three-dimensional structure. However, there are regions of proteins and even complete proteins that are fully functional even though they do not fold into secondary or tertiary structures in solution (Uversky 2011; Pancsa and Tompa 2012). These proteins are known as intrinsically disordered regions/proteins (IDRs/IDPs) and are present in all domains of life (Xue et al. 2010; Xue et al. 2012; Yruela et al. 2017). In eukaryotes, it is estimated that between 23 and 28% of proteins are highly disordered and more than 50% of eukaryotic proteins contain long IDRs (greater than 30 amino acids [aa]) (Xue et al. 2012). Structural disorder is considered to be significantly higher in eukaryotes than in prokaryotes, and it has been associated with organismic complexity (Ward et al. 2004; Liu et al. 2006; Xue et al. 2012; Peng et al. 2014; Yruela et al. 2017). Some gene families are particularly enriched in IDPs (Dai et al. 2016), and the total collection of IDPs/IDRs in a proteome is called the disordome (Zamora-Briseño et al. 2018).

IDPs/IDRs are characterized by their bias in aa composition, the low complexity in their sequences, and their low content of bulky hydrophobic aa (Romero et al. 2001; Wright and Dyson 2015). Several residues, known as order-promoting residues (W, C, F, I, Y, V, L, H, T, and N), are underrepresented, while they have an abundance of proline and polar and charged residues, known as disorder-promoting residues (K, E, P, S, Q, R, D, and M). Finally, the content of A and G is considered to be similar between IDPs and their ordered counterparts (Radivojac 2003). These characteristics in the primary structure give IDPs/IDRs a high net charge and a low average hydrophobicity (Uversky et al. 2000).

Intrinsic disorder promotes structural flexibility, and this flexibility allows fast transitions between different structural states, which promotes multispecific functions (Romero et al. 2001; Radivojac 2003; Uversky 2011; Sun et al. 2013; Covarrubias et al. 2017; Zamora-Briseño et al. 2018). IDPs/IDRs are associated with the regulation of transcription, signaling, and stress responses (Sun et al. 2013; Pietrosemoli et al. 2013).

The ubiquitous nature of IDPs in multiple cellular processes has encouraged the development of programs for intrinsic protein disorder prediction, which are based on the physicochemical attributes of these proteins. Some of these predictors have been shown to be highly reliable (Romero et al. 2001; Peng et al. 2006; Mészáros et al. 2009; Xue et al. 2010; Walsh et al. 2012; Dosztányi 2018). It is now possible to estimate the content of IDPs at the proteomic scale with high confidence (Walsh et al. 2012; Kurotani et al. 2014; Yruela and Contreras-Moreira 2012). This has made possible a significant number of studies aimed at answering questions about structural disorder at the genomic scale in a large number of models (Schad et al. 2011; Xue et al. 2012; Pietrosemoli et al. 2013; Peng et al. 2014). However, it is often difficult to compare results obtained from different studies and to produce generalizations from them, in part because each study uses different predictors (each with a different confidence level) and different criteria to estimate and classify structural disorder (Pancsa and Tompa 2012).

In plants, global-scale analyses of IDPs are limited to Arabidopsis thaliana and a few other plant models (Pancsa and Tompa 2012; Yruela and Contreras-Moreira 2012; Pietrosemoli et al. 2013; Kurotani et al. 2014; Vincent and Schnell 2016; Liu et al. 2017; Alvarez-Ponce et al. 2018). This limits the identification of biological roles of IDPs without homologous functions in other models. For example, plants have developed systems that allow them to adapt to the environment from which they cannot escape (Moore et al. 2008; Schad et al. 2011; Xue et al. 2012; Pietrosemoli et al. 2013; Peng et al. 2014). Since IDPs participate in signaling cascades and stress response processes, IDPs may be particularly important in plants’ development and adaptation to their environment (Kovacs et al. 2008; Pietrosemoli et al. 2013; Liu et al. 2017; Alvarez-Ponce et al. 2018; Zamora-Briseño et al. 2018). Furthermore, although conclusions derived from other models may be applicable to plants, this is not always the case. For example, in a study evaluating the correlation between the occurrence of post-translational modifications in IDPs/IDRs of plants, it was found that while phosphorylations, acetylations, and O-glycosylations show a preference for IDPs/IDRs as in animals, methylations occur preferentially in ordered regions (Kurotani et al. 2014).

In this study, we predicted intrinsic disorder in 96 proteomes of plants. We found bias in the relative disordome content among the different clades analyzed, with significant differences between monocots and eudicots. Unlike other reports, we classified disorder predictions into four categories (0–25, 25–50, 50–75, and 75–100% of intrinsic protein disorder). Based on this criterion, we observed that protein roles depend on their disorder level. The disorder level affects the abundance of aa and influences protein size, its distribution in the cell, and protein functions. For these reasons, we considered that disordome may have major adaptive implications.

Materials and methods

In order to predict intrinsic protein disorder in plant proteomes, we downloaded proteomes available in the Ensembl Genomes (Howe et al. 2020) and Phytozome (Goodstein et al. 2012) genomic browsers and from NCBI. All sequences below 30 aa in length were removed, as well as all non-specific positions. For each proteome, disorder prediction was estimated in the Espritz program using “X-ray” and “Best sw” parameters (Walsh et al. 2012). Predictions were grouped into four categories of intrinsic protein disorder: 0–25, 25–50, 50–75, and 75–100%. We estimated the relative abundance of each disorder category for each species. A phylogenetic tree was constructed using PhyloT (https://phylot.biobyte.de/index.cgi) by using the scientific name of each species and results were visualized with iTOL v3.4 (Letunic and Bork 2019).

We estimated the abundance of each aa per intrinsic protein disorder category for monocots and eudicots. To find enriched ontological functions among each category, protein sequences were annotated with InterproScan5 (Jones et al. 2014). This allowed us to handle annotated proteins with a homogeneous criterion. Then, a random sub-sample of 25,000 proteins was taken from each category to be analyzed using the WEGO online program (Ye et al. 2006) and compare parental ontological terms that were significantly enriched by category (p < 0.001). The protein length and intrinsic disorder content of each category were compared between monocots and eudicots, using t test and Kruskal-Wallis, respectively. Statistical differences among them were determined in R (R Development Core Team 2016), and data were plotted using ggplot2 (Ginestet 2011). In addition, the binned data in the four categories of disorder were analyzed with a principal component analysis (PCA) biplot, calculated with the FactoMineR library (Lê et al. 2008) in the R environment. A linear discriminant analysis effect size (LEfSe) (Segata et al. 2011) was performed to detect the discriminant protein categories between the eudicots and monocots; the significance was stated at a p value < 0.05.

To examine in detail the association between intrinsic protein disorder in both biological processes and cellular location, the A. thaliana proteome was submitted to a GO enrichment analysis using ShinyGO v0.61 (Ge et al. 2020). Before the analysis, the Arabidopsis proteins were allotted according to their disorder category (a p < 0.001 was used to define significantly enriched terms).

Results

Intrinsic protein disorder predictions showed that the proportion of intrinsic disorder is greater in some clades (Fig. 1). In monocots, the Poaceae family (both BOP and PACMAD clades) showed the highest proportion of proteins with highly disordered proteins (> 75%), compared with the group of Embryophyta. In eudicots, there was a higher proportion of IDPs < 50%. Some exceptions in this group, with a higher proportion of IDPs > 75% (compared with other eudicots), were D. hygrometricum (resurrection plant from the Asterids), Carica papaya (a drought-tolerant plant from the Malvids), and Jatropha curcas (a drought-tolerant plant from the Fabids).

Fig. 1
figure 1

Distribution of disordered proteins among plant proteomes. Species on the tree are grouped into two major clades by colored squares next to the tips, as follows: monocots (green square) and eudicots (red square). The relative abundance of disorder was not constant among the analyzed clades

Using PCA analysis, we observed that proteomes of monocots and dicots can be separated according to the relative abundance of the disorder content (Fig. 2a). In each case, the relative content of proteins with a disorder of at least 25% was statistically higher in monocots than eudicots. In contrast, eudicots had a higher relative content of ordered proteins (0–25% disorder) and monocots (Fig. 2b). In general, monocot proteomes had higher intrinsic disorder than eudicots (Fig. 2c). Interestingly, proteins with higher disorder level tended to be smaller, regardless of their clade (Fig. 3).

Fig. 2
figure 2

Comparison between the proteomes of monocots and dicots considering the disorder abundance. a PCA biplot of the four categories of disordered proteins between eudicots and monocots. Vectors are plotted towards the direction of its major abundance in the samples. b Paired comparisons reveal differences in the relative protein content of each disorder category. c LEfSe analysis results. The category 0–25 is better represented in the eudicots, while the categories 25–50, 50–75, and 75–100 are better represented in the monocots

Fig. 3
figure 3

Relationship between level of intrinsic protein disorder and protein length. At a higher level of disorder, the length of the proteins tended to decrease. This pattern was observed for all clades. Post hoc comparisons were of each category against the 0–25 category using Student’s t test. ****p < 0.0001

As expected, the proportion of hydrophobic amino acids was higher for more structured proteins and maintained fairly uniform compared with that of small, hydrophilic, and charged amino acids (Fig. 4). In addition, we found that each disorder category was enriched in different GO terms (Fig. 5). For example, the least disordered proteins (< 25% intrinsic disorder) are enriched in catalytic functions (clade 1, Fig. 5), while the most disordered proteins (> 75% disorder) were enriched with GO terms associated with responses to biotic and abiotic environmental stimuli, as well regulatory processes (clade 2 in Fig. 5). This coincides with results obtained in the ontology analysis of the A. thaliana proteome. There was a clear separation between biological processes in which each category of intrinsic disorder was enriched (Fig. 6). This was more evident the higher the degree of disorder.

Fig. 4
figure 4

A heat map of the percentage composition of the amino acids that make up the proteins according to the level of intrinsic disorder. The hydrophobic amino acids separated perfectly from the rest, with the exception of P. These amino acids were enriched in the most structured proteins and had a more uniform abundance pattern compared with the rest of the amino acids. The disorder-promoting amino acids were enriched (K, E, P, S, Q, R, D) in the proteins with the highest levels of disorder, with the exception of M, which did not seem to follow this typical behavior. The enrichment in the relative abundance of these amino acids was not constant between clades, which seems to be influenced by taxonomic factors

Fig. 5
figure 5

GO enrichment analysis of each intrinsic protein disorder category. Highly disordered proteins (clade 2) were enriched in ontologies related to the regulation of nucleus activities, regulation of metabolic processes, response to biotic and abiotic stimuli, and signaling, while ordered proteins were enriched among catalytic activities (clade 1)

Fig. 6
figure 6

GO enrichment analysis of biological components for the A. thaliana proteome divided into disorder categories. Biological processes varied among categories

Proteins with 0–25% intrinsic disorder were enriched in biosynthetic processes, such as lipid metabolism processes, catabolic processes or processes associated with phosphorous metabolism, and membrane transport and ion transport through membranes. As relative disorder increased, there was enrichment of biological regulatory processes, such as regulation of gene expression, regulation of transcription, or regulation of metabolic processes (categories 25–50 and 25–75%). Highly disordered proteins (> 75% intrinsic disorder) were specialized in biological processes associated with RNA regulation and RNA processing, such as alternative splicing and RNA transport, as well as negative regulation of metabolic processes and kinase activity (Fig. 6). These observed ontologies correlated with the enrichment of cellular components depicted in Fig. 7. There was a clear separation of GO-associated terms based on their disorder level. Thus, highly ordered proteins (< 25%) were widely enriched in ontological terms associated with various sub-cellular spaces, such as the mitochondria, Golgi apparatus, chloroplasts, or cell wall, but the nucleus was not enriched. In proteins with 25–50% disorder, enriched GO terms included the ribosomes, nucleosome, chromatin, nuclear lumen, nucleolus, or non-membrane-bound organelles. However, as intrinsic protein disorder increased, the diversity of enriched sub-cellular spaces decreased, while the nucleus was enriched. Thus, proteins with > 50% disorder were exclusively enriched in GO terms associated with the nucleus, particularly the nucleolus, spliceosome complex, nuclear pore, and transcription complex. In the 75–100% category, nuclear lumen, splicing complex, or nuclear body were enriched. Thus, the distribution of proteins within cells was associated with their level of disorder, with the nucleus enriched in IDPs.

Fig. 7
figure 7

GO enrichment analysis of cellular components of the A. thaliana proteome divided into disorder categories. The distribution of the proteins according to their level of disorder in the cell was not random; while the nucleus was enriched in disordered proteins, proteins with less than 25% disorder were more broadly distributed in the cell and were not enriched in the nucleus

Discussion

Studies aimed at understanding the roles of intrinsic protein disorder in plants are still scarce considering the overall number of reports on this topic (Zhang et al. 2018; Zamora-Briseño et al. 2019). In turn, most of these evaluations are not specifically on plants (Xue et al. 2012; Peng et al. 2014) or have been focused on studying very few models (Pazos et al. 2013; Kurotani et al. 2014; Choura et al. 2019). General conclusions obtained in these studies are highly valuable and deserve to be corroborated. For this reason, in this work, we carried out an extensive analysis of the distribution of intrinsic protein disorder by analyzing proteins from the proteomes of 96 plants.

Some previous estimations of the variation in relative intrinsic protein disorder content among plant clades have yielded contradictory results. First, it was found that the relative content of intrinsic protein disorder does not vary between monocots and dicots (Yruela and Contreras-Moreira 2012). However, it was later found that the relative content of protein intrinsic disorder is different between them (Kurotani et al. 2014; Choura et al. 2019). These differences are likely due to the small sample size used, as well as differences in the criteria used to estimate intrinsic protein disorder.

We compared the relative content of intrinsic protein disorder among different plant clades. For categories of disorder greater than 25%, we found that intrinsic protein disorder content is higher in monocots (specifically the Poaceae family) and eudicots, with the opposite trend in the 0–25% disorder category. For comparative purposes, we considered that this category is mainly composed of structured proteins, while the other three categories are composed of intrinsically disordered proteins with different levels of disorder.

Although the definition of the four disorder categories was not based on any a priori biological criterion, it allowed us to observe a clear relationship between proteins’ level of intrinsic disorder and their functions. We consider that cataloging proteins into ordered versus disordered is overly simplistic. In other words, it is important to determine not only whether or not a protein is intrinsically disordered but also their degree of disorder, since this is associated with function. In some ways, it attempts to capture part of the different intrinsic disorder flavors described for IDPs (Dunker et al. 2008; Walsh et al. 2012; Forcelloni and Giansanti 2020).

It is interesting that as intrinsic disorder increases, there is a decrease in protein length. This negative correlation between intrinsic disorder and protein length has been previously reported and is generally accepted (Howell et al. 2012; Peng et al. 2014; Afanasyeva et al. 2018; Zamora-Briseño et al. 2019). This is expected given the biased aa composition of IDPs because in some way the occurrence of amino acids is associated with protein length (Carugo 2008). Since longer proteins tend to be more conserved than small proteins (Lipman et al. 2002), more disordered proteins must be less conserved. It is known that amino acid changes are faster for proteins with higher proportions of aa exposed to the solvent, as occurs with IDPs (Lin et al. 2007). Moreover, IDPs have a higher mutational rate than globular proteins and have a high tolerance to mutations (Brown et al. 2002; Forcelloni and Giansanti 2020). This suggests that disorder-promoting aa are subjected to reduced evolutionary constraints (relaxed evolutionary forces at these sites) and therefore have a higher mutation rate than order-promoting aa (with stronger evolutionary constraints to keep their function). This explains why the former are clearly separated on the heat map, with a more conserved distribution pattern among clades compared with disorder-promoting aa.

The relative abundance of aa apparently differs among the clades. In general, it is considered that compared with structured proteins, IDPs show a reduction in their contents of C, W, Y, F, I, V, and L, at the same time as being significantly enriched in M, K, R, S, Q, P, and E (Dunker et al. 2008). This general rule does not seem to follow the same pattern in plants because M is not enriched in any of the clades. Furthermore, A and G are enriched in IDPs of algae and monocot aa, but these aa are not usually considered enriched in IDPs (Radivojac 2003). Furthermore, the enrichment of disorder-promoting aa seems to differ among clades.

Since there is a positive correlation between genetic recombination rate and protein disorder frequency in plants, it has been proposed that genetic recombination could be considered an evolutionary force that contributes to structural disorder in proteins (Yruela and Contreras-Moreira 2013). The fact that IDPs have a higher recombination rate, higher mutability is of particular interest in plant adaptation to challenging environmental conditions. According to our GO enrichment analysis, highly intrinsically disordered proteins are enriched in the response to biotic and abiotic stimuli. Moreover, intrinsic disorder is higher in young genes and in genes created de novo in alternative reading frames, as well as in orphan genes of several non-plant species (Rancurel et al. 2009; Mukherjee et al. 2015; Wilson et al. 2017). So, it is possible that proteins encoded by young and orphan genes in plants possess a higher degree of disorder, which must be answered in the future.

Highly disordered proteins (> 75% intrinsic disorder) are particularly enriched in the regulation and transport of RNA, as well as RNA splicing. This is consistent with previous data indicating that a large number of proteins that bind to RNA exhibit broad IDRs. For example, it is estimated that more than 50% of amino acid residues of RNA chaperones occur in IDRs (Tompa and Csermely 2004). This has wide-reaching consequences. For example, alternative splicing is a very important process for stress-induced responses in plants, which can modulate the phenotypic traits of plants and can contribute to their adaptations to different environmental stressors (Mastrangelo et al. 2012; Ling et al. 2019).

In addition, intrinsic disorder influences the sub-cellular localization of proteins; IDPs are enriched in the nucleus, as has been suggested for other non-plant models (Frege and Uversky 2015; Skupien-Rabian et al. 2015). This is very reasonable considering that IDPs have functional specialization (Vincent and Schnell 2016; Deiana et al. 2019) and such functional specialization also depends on protein length (Howell et al. 2012). Considering that monocots possess a higher proportion of IDPs than eudicots, it is feasible that the proportion of nuclear proteins is also higher. In comparative terms, we expected that monocots would possess a higher proportion of nuclear proteins than eudicots.

Finally, given that a large part of the proteome with unknown functions (dark proteome) is enriched in IDPs (Bhowmick et al. 2016), it can be inferred that a large part of the disordome represents a reservoir of potential functions involved in the stress response that are waiting to be discovered. This may be exploited for biotechnological purposes, particularly those aimed at increasing resistance to environmental stressors. Thus, we consider that the characterization of genes that encode IDPs with unknown functions and that respond to stress could lead to the discovery of new mechanisms of stress response in plants.

Conclusion

This study represents the most extensive analysis of intrinsic protein disorder in plants to date. It is evident that the level of intrinsic disorder actively influences several functional characteristics of proteins beyond their lack of folding. In plants, the involvement of intrinsic disorder in environmental adaptation processes is of particular importance and represents a promising opportunity for discovery.