Introduction

Proteins tend to evolve through an intricate interplay between sequence divergence, protein structure stability, and functional constraint. In general, protein structure is assumed to be mostly maintained as sequences diverge in order for proteins to fold properly [1]. If a protein does not fold properly, its functional properties are often negatively affected. Based on the PDB collection, protein secondary structure elements and their topology (or fold) are often highly conserved, implying that the topology of protein secondary structure elements can remain very similar even after sequences have diverged beyond recognition.

Nonetheless, most proteins in PDB are shown as static snapshots that belie their structural flexibility. Proteins with highly flexible regions are not amenable to traditional experimental structure determination and, in many cases, such methods are not even attempted [2]. Multidomain proteins are frequently truncated to bypass high flexibility or size restrictions. Examples of shape-shifting or metamorphic proteins [3] that can refold upon changes in domain contacts or by changes in environmental conditions are still rare in PDB, but one interesting example is RfaH, where the C-terminal domain can refold from an all alpha-helical fold to a fold containing only beta strands in response to altered interdomain contacts [4]. This extreme case of fold transition and conformational flexibility illustrates that protein structure is not always conserved among homologous proteins and emphasizes the importance of domain context for our understanding of protein fold space. However, conformational flexibility is not always as dramatic as for metamorphic proteins and smaller changes are more common.

Many proteins exist as conformational ensembles rather than a single conformation. This enables the rapid sampling of multiple conformations in a flattened, rugged energy landscape where certain conformations may predominate, even for proteins with global intrinsic disorder and for proteins with intrinsically disordered regions (IDRs) [5]. Some IDRs can act as dynamic switches in response to various signals such as pH, temperature, ligands, allosteric effectors, and post-translational modifications [6], allowing the conformational ensemble to re-equilibrate, ultimately causing a population shift [7]. In accordance with the extended conformational selection processes, binding events ranging from lock-and-key to induced fit are plausible, including IDRs that participate in important interactions mediated by fold-upon-binding events [8], or bind without folding [9] Furthermore, the conformational ensemble undergoes population shifts in response to point mutations and sequence variability [10]. Consequently, amino acid replacement will ultimately impact processes of conformational selection in response to different stimuli in a lineage-specific manner, as a mutation-driven conformational selection process [11]. For ambiguous sites that are disordered in some PDB structures but ordered in others, the amount of ambiguity depends on exposure to different environments, implying that regions with conflicting disorder assignments should not be regarded as lacking intrinsic disorder entirely [12]. By definition, a region of ambiguous disorder can be either disordered or ordered, depending on the environmental conditions, similar to previously described dual-personality regions [13].

Structurally disordered proteins are often found to interact with many different cellular targets and to perform promiscuous or moonlighting functions [14,15,16]. As the conformational ensemble transitions from one favored conformation to another, it may pass through unintended opportunistic functional conformations. Thus, mutation-driven conformational selection provides a mechanism for functional divergence among related proteins and conformational flexibility in proteins may play an important role in the evolutionary innovation and fluctuation of protein functions mediated through IDRs (e.g., [17,18,19,20]). Importantly, mutation-driven conformational selection may mostly be driven by genetic drift in a near-neutral, perhaps deleterious, manner that at times offers a rapid way to adapt to altered environmental conditions or signals.

Contemporary work on intrinsically disordered proteins has illuminated the profound functional importance of disorder, particularly in regard to high-level eukaryotic cellular complexity and the expansion of the eukaryotic proteome. Recent work by Chakrabortee et al. entitled Intrinsically Disordered Proteins Drive Emergence and Inheritance of Biological Traits, describes disordered yeast proteins with the capacity to induce heritable molecular memories with specific biological traits, stable over generations and transmissible from individual to individual [20]. The inheritance of these protein-driven traits is prion-like but, importantly, amyloid formation is not detected, and the inheritance-inducing proteins are conserved from human to yeast [20]. Additionally, the relaxed selective pressure experienced by many IDRs may allow for the emergence of parallel, nucleotide-level functionality within the coding regions of disordered or partially disordered proteins [21]. Here, we review fundamental evolutionary underpinnings that have influenced intrinsic disorder content in eukaryotic genomes, with an emphasis on the importance of disorder for eukaryotic proteome expansion and functional divergence, the interplay between secondary structure and disorder on evolutionary time-scales, and the evolutionary dynamics of intrinsic disorder.

Distribution of intrinsic disorder

According to proteome-wide disorder predictions, eukaryotes have a significantly larger fraction of intrinsic disorder in their proteomes than prokaryotes [22, 23]. On average, the disorder content is 7.4, 8.5, and 20.5% in Archaea, Bacteria, and Eukaryotes, respectively [24]. Despite the sharp increase in disorder content from prokaryotes to eukaryotes, the notion that disorder is correlated with organismal complexity (as measured by number of cell types) has not been strongly supported [22, 23]. However, many characteristic features of eukaryotic genomes appear to be linked to intrinsic disorder, particularly those with perplexing evolutionary origins. It is widely noted that concepts of organismal complexity are tightly linked with small effective population sizes, suggesting some type of drift barrier driving complexity in an expanded genome and/or simple relaxed selection on structure, as described below.

Most prokaryotic genomes are densely packed (“wall-to-wall”) with transcribed DNA, containing relatively few intergenic regions or non-coding spacers within their protein-coding genes, whereas eukaryote genome sizes are largely decoupled from their biological information content, and in many taxa, only a small fraction of the total genomic DNA is evidently transcribed [25, 26]. A compelling explanation for this disparity in genome architecture relates to the fundamental theorem of natural selection originally derived by Fisher, namely, that the efficiency of natural selection is directly related to the diversity (and by extension, the effective size) of a population [27]. Recent work suggests that complex genomic features in eukaryotes, including the emergence of large protein families and the presence of intronic regions within protein-coding genes, are the result of “non-adaptive” evolution: persistently low selective pressure maintained by small effective population size [28,29,30]. Interestingly, many hallmark features of eukaryotic proteins such as intronic DNA insertions, large functional domain architectures and complex molecular interaction networks are often associated with or even dependent upon intrinsic disorder (further discussed below).

Structural evidence has confirmed that the non-coding DNA fragments within eukaryotic protein-coding genes (introns) are in fact derived from ancient bacterial Group II selfish elements that were introduced to the nuclear genome by endosymbionts during eukaryogenesis [31,32,33]. This unique evolutionary event facilitated the emergence of the eukaryotic splicesome, allowing for introns to be removed, as well as for exons (the remaining coding regions) to be rearranged, prior to translation. Nilsen and Graveley contend that alternative splicing has enabled a crucial expansion of the effective eukaryotic proteome [34] and, notably, protein regions associated with alternative splicing are often intrinsically disordered [35, 36].

Laboratory simulations have demonstrated that under strong, efficient selective pressure, genomes become minimally short, and even mildly deleterious genes are eliminated [37, 38]. Consequently, there is mounting support for the notion that rapid eukaryotic genome expansion, and the resulting low-information-density architecture, is a “syndrome” brought about by pervasive genetic drift and low purifying selective pressure [39, 40] Importantly, this expansion has occurred alongside several other genomic features (some of which were discussed above), and Koonin [41] asserts that the common ancestor of all eukaryotes was of comparable complexity to many modern protists, indicating that expansive, complex genomes are an enduring trait within Eukaryota. Given the close connection that intrinsic disorder has to several defining features of eukaryotes, it is likely that the sharp rise in disordered proteins observed in this lineage is yet another “symptom” of their genomic “syndrome.”

Eukaryote proteome expansion (and what disorder has to do with it)

Gene and genome duplications

During the course of eukaryotic evolution, multiple whole genome duplication (WGD) events are known to have occurred in major eukaryotic lineages. Based on sequence comparison, only more recent WGD events can be detected, but earlier WGD events are probable [42]. A selection of known WGD events (Fig. 1) show that Paramecium tetraurelia has undergone three rounds of WGD [43], WGD is common in plants, with both more ancient [44] and recent WGD especially in flowering plants [45], but also in moss [46]. In fungi, one WGD occurred in the Saccharomyces cerevisiae lineage [47], in animals, two rounds of WGD occurred at the origin of vertebrates [48], followed by numerous WGD in teleosts, e.g., Danio rerio has undergone one round of WGD [49], while Salmo salar has undergone a fourth round [50] (not shown). In addition to WGD, small-scale gene duplications (SSDs) whereby one gene or chromosome segment is duplicated, also constitute a major mechanism driving functional divergence in protein family evolution. The evolutionary dynamics of genes that emerged after WGD versus SSD are different and this has been analyzed in detail [42, 51].

Fig. 1
figure 1

A selection of known whole genome duplication (WGD) events in eukaryotes. One round of WGD is illustrated by a blue rectangle. The background is colored by geological era. Time axis and geological eras are from TimeTree [113,114,115]

Gene duplications generate redundancy, enabling the exploration of novel functions [52]. Through accumulation of mutations, different evolutionary fates are plausible for the two different copies [28, 53]. The most common scenario after gene duplication is that one copy loses its function and becomes pseudogenized [28]. Retention rates are higher for duplicates that stem from WGD than from SSD, especially for gene copies that are sensitive to altered gene stoichiometry (dosage effects) [42]. For genes that are retained in duplicate, functional divergence between the two copies often results [54]. Proposed models for retention are neofunctionalization [52] and subfunctionalization [55] (Fig. 2). In the neofunctionalization model, one domain copy is able to retain its original function while the duplicated domain can explore new functions. In the subfunctionalization model, the ancestral function is divided amongst the resulting duplicates. Subfunctionalization has been computationally shown to be a neutral process that can result in neofunctionalization [56]. In addition, subfunctionalization in gene expression (dosage) between two duplicated copies contributes to their pattern of retention [57]. Recent work has described the expected interplay of gene dosage with neofunctionalization and subfunctionalization [58].

Fig. 2
figure 2

Gene duplication generates two copies of the same gene and consequently functional redundancy. Different scenarios after gene duplication include pseudogenization (one copy is lost), subfunctionalization (the two copies subdivide function or gene expression), and neofunctionalization (at least one copy gains a new function)

In vertebrates, the retention rate for ohnologs (proteins related by WGD) from the WGD events at the origin of vertebrates is significantly higher than for SSD for genes involved in protein binding, signal transduction, development, DNA binding, receptor activity, ion transport, and protein modifications [59]. In plants, genes with functions in signal transductions and transcriptional regulations follow a similar pattern [60]. Copies retained after WGD are often dosage-sensitive (sensitive to unbalanced stoichiometry of gene copies) [42]. Many of these protein functions are known to depend on intrinsic disorder [61]. Indeed, intrinsically disordered proteins have been found to be dosage-sensitive (sensitive to unbalanced gene expression) and it was postulated that the promiscuous interactions that disordered proteins frequently partake in could explain the need to maintain stoichiometry [62]. On evolutionary time-scales, multiple interaction partners provide multiple opportunities to subfunctionalize and each partner can neofunctionalize, increasing the selective pressure for both copies from both partners to be retained [42]. Furthermore, after WGD in yeast, proteins enriched in post-translational modification sites are retained at a greater rate [63]. The post-translational modification sites are often found within IDRs [64, 65] and indeed, yeast ohnologs are more intrinsically disordered than singletons (for which the other copy was lost after WGD) [66]. Further comparison of the yeast ohnologs with pre-duplication orthologs shows that 29% of the duplicates and 25% of the singletons have gained disorder, while 37% of the duplicates and 25% of the singletons have lost disorder [66]. The ohnologs that gained disorder were also found to have a higher number of interactions, suggesting that disorder facilitates divergence and innovation [66]. Comparing interactomes of human, fly, and yeast, structurally disordered networks are rewired significantly faster than ordered networks, leading to a speculation that disordered proteins have a higher capacity to rapidly rewire their interactions [67].

Domain rearrangements

Eukaryotic proteins are significantly longer and have more domains than prokaryotic proteins [68]. Domains are the main unit of protein evolution [69]. In addition to sequence divergence, proteins also diverge by rearranging domain architectures and through loss and gain of domains [70, 71]. Eukaryotic multidomain proteins are frequently the result of stepwise insertions of a single domain, but occasionally, several domains are added in tandem [71]. Mostly, established domains that already exist in the proteome are added to proteins and many domains are found in numerous different domain architectures. Gain of a novel (emerging) domain may occur by, e.g., acquisition of novel genetic material, converting non-protein coding genetic material into protein-coding genes and this novel genetic material is often intrinsically disordered [72]. Disordered, emerging domains were found to be rapidly spread across Drosophila lineages [73] and in plants [74]. Domains can also be lost from multidomain proteins [73]. Altered domain architecture may impact the amount of disorder that a protein can withstand, as is the case in the p53 DNA-binding domain [75]. In the p53 family, a choanoflagellate has three of four domains found in the vertebrate p53 family and the four domain protein is present in gastropods. All but one domain are missing from the p53 protein in Neoptera. For the p53 DNA-binding domain that is shared from choanoflagellates to vertebrates, the disorder content is positively correlated with the number of domains in its domain architecture. The neopteran proteins have not only lost the other domains but also disorder content, while the early choanoflagellate and four domain gastropod proteins have disorder content similar to the 3–4 domain proteins in vertebrates [75]. In addition, for the three vertebrate paralogs in the p53 family (p53, p63, and p73), p53 has lost one domain and for the p53 DNA-binding domain, some of the secondary elements have lower conservation of disorder for the p53 clade than in the p63 and p73 clades, while others, e.g., one of the main beta strands in the central beta sheet are conserved in disorder for the p53 clade, but are not disordered in the p63 and p73 clades [75].

Expansions of eukaryotic proteins are often due to insertion of disordered sequence [76]. A common event in protein evolution is the occurrence of insertions and deletions (indels) [77]. Indels have been demonstrated to have high disorder content, with longer indels being particularly disordered [78]. However, indels do not induce disorder but rather appear to accumulate in regions that are already disordered [76]. Repeat sequences, which are often disordered [79,80,81], have been associated with increased indels [82]. At the gene level, indels often occur in multiples of three, an indication that there may be selective pressure to maintain the reading frame, as a frameshift mutation may be deleterious [83]. Predictions on the effect of known frameshift mutations showed that the majority were gene-damaging [84]. Deleterious mutations caused by a frameshift indel may be compensated for by another indel that restores the reading frame [83, 84].

Sequence divergence rate in disordered sites

Early research has suggested that intrinsically disordered regions diverge rapidly in sequence [85, 86]. However, in a later study, disorder-promoting residues were found to have higher conservation in disordered regions than in ordered regions, and more than 25% of the disordered sites evolved more slowly than the ordered sites [87]. A possible reason for such conflicting results is that, in general, the relationship between sequence divergence and intrinsic disorder has been conceptualized in a “one-way” statistical framework, without direct consideration of the possible interaction among the multiple structural factors that drive sequence divergence. To address this, a large-scale study of metazoan protein families investigating the interaction of disorder, secondary structure, and functional domains on site-specific sequence divergence rates was recently performed [88]. Focusing only on gap-free sites, with 100% conserved structural predictions across all sequences in each alignment, statistically significant shifts in the rate distributions of opposing structural properties were found: ordered sites tended to be more conserved than disordered sites, sites in secondary structures tended to be more conserved than sites in random coils, and sites within functional domains tended to be more conserved than sites in linkers [88]. However, a considerable overlap between each of these rate distribution pairs was found, and factorial analysis indicated a strong confounding interaction between disorder propensity and secondary structure involvement: sites that were predicted to be disordered, but also involved in secondary structure, were the most evolutionarily constrained at the residue level, even more so than sites within ordered secondary structures [88] (Fig. 3).

Fig. 3
figure 3

Hypothetical multiple sequence alignment illustrating the relationship between structural properties and the rate of sequence evolution: sites with propensity for both intrinsic disorder and secondary structure tend to evolve slowly

In silico simulations have also found that disorder is more difficult to maintain than secondary structure elements on evolutionary time-scales [89]. The dataset from [88] described above had a total of ~5.9 million gap free alignment sites, about ~29% of which show a mixture of disorder and order among sequences. This result corroborates the notion that disorder is not necessarily a conserved trait among members of a protein family. Other researchers have argued that there are actually distinct types of intrinsic disorder, some of which are retained across lineages and have highly conserved amino acid sequences [90, 91].

Together, these findings are compatible with the realization that different IDRs play diverse and often important functional roles in vivo [92]. For example, whereas some IDRs simply function as entropic chains or flexible linker regions around domains, others act as recognition sites that mediate protein–protein interactions by undergoing disorder-to-order transitions upon binding to their one or many different interaction partners [61].

Disorder-to-order transitions

Real-time disorder-to-order transitions

Regions in proteins that are involved in disorder-to-order transitions are commonly referred to as molecular recognition features (MoRFs) that upon interaction with another protein or nucleic acid can fold into an alpha helical structure, a beta strand, a fixed coil, or a complex mixture of all [93]. Eukaryotic proteins contain about 2.5 disordered regions. Of these disordered regions, about one-fifth contains at least 1 MoRF [94]. Also embedded in disordered regions are small linear motifs (SLiMs) and low complexity regions. Altogether, these contribute to function and functional promiscuity mediated through disordered regions with both beneficial and some less beneficial effects [95]. Notably, MoRFs are known to form transient secondary structural elements in their bound state [93, 96], and it is possible that the highly conserved protein regions described by [88], which are predicted to be both intrinsically disordered and involved in secondary structures, are actually MoRFs. Furthermore, proteins may also contain ordered regions that are activated by unfolding in response to a certain trigger [97]. The triggers range from biomolecular interactions to global environmental factors such as temperature, pH, or light causing these proteins to undergo functionally important order-to-disorder transitions in real time [97].

Evolutionary time-scale disorder-to-order transitions

Disorder evolves in patterns that suggest it contributes to fine-tuning regulation, stability, and interactions, especially after gene duplication. Some of these functions are induced through post-translational modification, such as phosphorylation. As noted above, post-translationally modified genes are retained at a higher rate after WGD [63]. Importantly, sites enabling post-translational regulation have been found to systematically contribute to functional divergence after gene duplication [98]. SLiMs that promote transient interactions with other proteins are abundant in disordered regions. While some SLiMs are conserved, others are rapidly gained and lost in different lineages, as well as after gene duplication [99]. Beneficial motifs that have an adaptive phenotype are thought to (1) become fixed more frequently and (2) optimize the motif binding pocket, sometimes at the expense of the motif itself [99]. A similar scenario can be envisaged for disorder. Disordered regions are present as a conformational ensemble at an equilibrium, but when a non-functional disordered region gains a conformation with a possibly beneficial function (e.g., displaying a SLiM, sometimes by chance), mutations may stabilize that conformation further, driving the initial conformational equilibrium towards that conformation and eventually, the disordered region will become ordered (Fig. 4). By becoming ordered, the protein can undergo a neostructuralization event, where it obtains ordered structured regions not present in ancestral homologs [19]. By gaining an ordered region, homeostasis can be at risk since loss of disorder increases the protein’s half-life and disorder content can potentially fine-tune protein turnover rate on evolutionary time-scales [100]. One can speculate that the previously disordered, now ordered, segment has increased its fitness, allowing another region to become less structurally constrained. Thus, an ordered region can transition towards disorder, perhaps through transient functional conformations and motifs. Eventually, a transition from order-to-disorder has occurred on evolutionary time-scales. It should be noted that even if the same region transitions from disorder to order and back to disorder, the conformational ensemble will likely have a different composition (Fig. 4).

Fig. 4
figure 4

Disorder-to-order transition and order-to-disorder transition on evolutionary time-scales. Disorder-to-order: a region from a hypothetical disordered protein becomes, e.g., preferentially stabilized, driving the equilibrium of the conformational ensemble towards solely the preferred conformation. The preferred conformation is further stabilized by mutations and becomes incrementally predominant. Over time, the region becomes ordered displaying only the predominant conformation. Order-to-disorder: a region from a hypothetical order protein starts to become more flexible, but is stabilized under certain conditions. Flexibility is beneficial and mutations to promote disorder accumulate, perhaps additional functions arise, and additional preferred conformations may become more predominant. Over time, the region becomes disordered, existing as a conformational ensemble

Evolutionary transitions from disorder-to-order and from order-to-disorder were observed in a large-scale study of 17 kinase paralogous clades. Looking at patterns of disorder conservation within and between clades, disorder-prone regions are apparent [101]. The disorder-prone regions have conserved regions of disorder in multiple clades, but not necessarily in closely related clades. This suggests that even if disorder is found for the same region in two different clades, the disorder may be a homoplasic trait (due to convergence) with important differences in the conformational ensemble and consequently, function may not be the same. Notably, no disorder-prone region is conserved across all 17 clades [101]. Within orthologs, certain sites are undergoing disorder-to-order transitions on evolutionary time-scales in a lineage-specific manner, characterized by a moderate disorder-to-order transition rate. Lineage-specific changes in conserved disorder are also present in the p53 family: the p63 and p73 clades have strong signals of regions that have become ordered in the ray-finned fish lineage implying functional divergence [75]. Similar results are observed in Arabidopsis NAC transcriptions factors, where intrinsic disorder is not conserved across the entire family though subgroup-specific patterns can be found [102]. Additional examples of protein families where disorder prediction implies that evolutionary disorder-to-order transitions have occurred are the mediator complex [103], the vertebrate Prion protein family [104], the clusterin family [19], the synuclein family [19] and in emerin, various phylogenetic groups showed differential tendencies towards being disordered [105].

Fig. 5
figure 5

Superimposing disorder prediction onto a multiple sequence alignment enables rate inference of disorder and order for homologous sites over a phylogenetic tree based on the corresponding multiple sequence alignment. Conservation of intrinsic disorder versus rate of disorder-order transition for four hypothetical sites: while the first site is conserved in disorder, sites 2–4 have a conservation of 0.5 but the rate varies from slow to fast depending on the pattern of disorder and order in the evolutionary context

The evolutionary disorder-to-order transitions are potentially biased from disorder to order since disorder is difficult to maintain on evolutionary time-scales [89], but transitions in both directions must occur. When different models of evolution were constructed for disordered versus ordered proteins, the resulting disordered and ordered matrices showed that substitutions from order-promoting residues to disorder-promoting residues were unlikely for both matrices, though they were slightly more likely for disordered proteins [106]. Considering that different studies have found that the degree of sequence conservation in disordered regions depends on structural and functional properties of the disordered sites [85, 87, 88, 107], e.g., sites with both disordered and structured properties are more conserved than other disordered sites [88, 107], it is necessary to carefully construct such models considering additional properties of the disordered sites. In addition, even if disorder may be found in disorder-prone regions, these are not necessarily conserved, and care must be taken to ensure disorder conservation across compared sites. Disorder patterns that seem conserved between two paralogous clades can arise from convergent evolution [101], but further research is needed in this area. Nevertheless, patterns of disorder can be informative in finding remote homologs that are difficult to detect with sequence-based methods alone, and have been found to identify remote Myc homologs [108] and remotely related E3 ubiquitin-protein ligases [109], but clustering of sequences based on such patterns may be more informative for functional inference than for phylogenetic signal.

Conservation of functional disorder

Bellay et al. classified disordered sites among yeast orthologs into functionally constrained disorder, considering disorder to be conserved if at least 50% of sequences at an alignment site were predicted to be disordered and constrained if sequence was conserved at 50% [90]. Furthermore, sites were classified to have functionally flexible disorder if at least 50% of sequences at a site were predicted to be disordered but with sequence conservation below 50%. Last, sites with few disorder predictions were classified as non-functional disorder. Using slightly more generous cutoffs across metazoa, constrained disorder was allowed to be less conserved (>30%) in disorder but highly conserved (>90%) in sequence while flexible disorder was less conserved in sequence (<90%) showing that approximately 30% of sites were disordered (constrained or flexible) and that more constrained disorder is found for human proteins that lack yeast orthologs (8%) than for human proteins with yeast orthologs (5%) [91]. While this may indicate that the older orthologs have lost disorder or that more disordered domains have emerged or spread after the divergence of yeast and metazoa, the arbitrary cutoffs in these studies are concerning since an arbitrary cutoff of 50% disorder conservation at a site could mean that the state changed one time or that it is changing between every other species with high explicit impact on the evolutionary dynamics (or rate) by which disorder is lost or gained (Fig. 5).

Protein evolvability and disorder

Examining the fold distribution according to the CATH database, about 1300 folds describe the experimentally determined protein structure space [110]. More than half of the non-redundant domains in CATH can be described by the 100 most frequently found CATH superfamily domains [110]. Many of these domains have folds that display regular secondary structure architectures with supersecondary structures forming a stable core [110]. These are folds with high evolvability. Like disordered regions, these are characterized by high sequence divergence and a plethora of functional contexts. One important distinction must be made; while the common folds can promote various functions, proteins that assume these folds typically only have one function, whereas disorder enables functional versatility within the same protein. The amount of disorder is positively correlated with robustness to withstand mutations while still maintaining structure and both are negatively correlated with fold complexity [111]. In this context, fold complexity is defined as average contact order based on the linear distance in the sequence between two contacting residues. Alpha-helices have low contact order due to their local contacts [112] and consequently, several of the most common CATH folds, with regular secondary structure architectures and rich in supersecondary structures, may also have low contact order and thus low fold complexity. The disordered sites that also have propensity to form secondary structure are more conserved [88]. Thus, this category of disorder appears to have lower robustness that may be due to an increased constraint to fold under certain conditions.

Evolution of disorder drives biological diversity

Using Bellay’s criteria [90], only a small fraction of protein sequence space contains functional disorder. Indeed, most disordered regions appear to experience relaxed selective pressure, and thus, high amino acid substitution rates [85, 88]. However, it is now also clear that intrinsically disordered sequences should be considered in a larger structural and functional context to evaluate the evolutionary pressures that act upon them. Moreover, the interplay between intrinsic disorder and other structural/functional properties is likely to have unforeseen, confounding effects that can only be detected using appropriately complex analyses [88].

What Bellay et al. [90] classify as non-functional disorder may in fact contribute significantly to natural variation within a species and to biological diversity between species. While some disordered regions may need to perform predictable, reliable functions, others may be important for generating subtle changes in response to a signal. By accumulating tiny changes in function affecting protein dynamics, binding affinities, and promiscuous and moonlighting functions, subtle variation and diversity can emerge within a population or protein family. Ultimately, such small changes in disorder content can greatly impact a population’s response to changes in the environment.

If disorder can be used to prime or seed molecular memories that promote a heritable and beneficial trait [20], can that trait be selected for, in the sense that disorder-prone residues will start to become replaced with order-prone residues that can fold into the beneficial conformation without the original primer or seed if the environmental trigger remains? Additionally, if IDRs tend to occur in evolutionarily labile sequence regions, can they serve as hotbeds for the novel acquisition of parallel, nucleotide-level biological function [21]? Hopefully, future work will shed more light on the increasingly broad functional capacity of intrinsic disorder in eukaryotes. Still, what has been discovered so far provides compelling evidence for the notion that protein disorder is an indispensable component of the seemingly non-adaptive evolutionary processes responsible for the striking complexities and functional novelties observed throughout the eukaryotic lineage.