13.1 Introduction

By the end of twentieth-century and start of twenty-first century, the understanding of disordered or unstructured proteins started developing. At present, a large number of researchers from every corner of world have devoted their research to describe the proper structure and functioning of disordered regions. A large proportion of gene sequences appear to code not only for folded, globular proteins but also for long stretches of amino acids that are likely to be either unfolded in solution or adopt non-globular structures of unknown conformation (Wright and Dyson 1999). Approximately 44% of genes in humans that code for proteins contain disordered regions (Van Der Lee et al. 2014; Oates et al. 2013). Generally, these proteins or regions are termed as intrinsically disordered proteins or regions (IDPs/IDPRs). These intrinsically disordered protein regions (IDPRs) can be highly conserved within various closely related families or domains of proteins in both composition and sequence (Van Der Lee et al. 2014; Chen et al. 2006). The disordered regions are partially or fully unstructured and are characterized based on various parameters. In other words, they do not possess a proper three-dimensional structure as they fail to acquire structural propensity measured through spectroscopy techniques such as X-ray, NMR, etc. (Dunker et al. 2001). The intrinsic lack of structure can confer functional advantages to a protein like IDPs provide larger interaction surface area, more conformational flexibility, and exposure to interaction prone structural motifs allows IDPs to interact with several other proteins (Babu et al. 2011).

Furthermore, distinct post-translational modifications alleviate regulation of their function and stability in a cell. Some IDPs can attain a fixed tertiary structure on interaction with other molecules known as folders. In contrast, other are called non-folders which do not possess any defined tertiary structure under any physiological conditions. They have ability to undergo partial folding on interaction with specific binding partner proteins (coupled folding and binding), whereas many others constitute flexible linkers that have a role in the assembly of macromolecular arrays (Nishimura et al. 2005). Their conformations may vary from random coils, partially extended globules to collapsed globules with different contents of secondary structure. These distinctly variable structural behaviors of IDPs led to propose multi-state protein structure theories such as trinity (collapsed, ordered, and extended disorder) and quartet (coil, pre-molten globule, molten globule, and folded structure) (Zhang et al. 2013; Dunker and Obradovic 2001).

To elucidate the structure of IDPs and detailed mechanistic insight into their function, firstly, IDPs differential conformations need to be determined. The molecular dynamics (MD) simulation is an excellent computational route for determination of proteins disordered states at atomic level. However, the peculiarities of MD simulation results depend on the accuracy of the physical model (i.e. forcefield) used (Robustelli et al. 2018). There are a number of force fields have been used for the description of folded proteins, but limited for disordered structure prediction (Nerenberg et al. 2012; Piana et al. 2015; Best et al. 2014; Mittal and Best 2010; Lindorff-Larsen et al. 2012, 2013; Beauchamp et al. 2012; Lange et al. 2010). Therefore, in this chapter, we are focused on the IDPs and how the computational method MD simulation exploring the structure disorder via different force fields.

13.2 IDPs and IDPRs: Structure-Function Relationship

The universal lock and key hypothesis for structure function paradigm changed the protein science for a longer time. The proteins 3-D structures were mapped mostly with X-ray crystallography. Despite that, most of the proteins lack complete structures and so-called missing electron density regions (Le Gall et al. 2007). These proteins and regions are unfolded, unstructured and inherent properties of proteins, hence named “intrinsically disordered proteins or intrinsically disordered protein regions (IDPs/IDPRs)” (Dunker et al. 2013). The comparison studies of ordered and disordered proteins backbone revealed that the disorder proteins are rich in amino acid Ala, Arg, Gly, Gln, Ser, Glu, Lys, and Pro (Williams et al. 2000; Romero et al. 2001).

As it is stated that the ordered protein follows the structure-function paradigm, i.e., sequence-structure-function, whereas disordered protein follows the disorder-function paradigm (Uversky 2013). The disordered proteins are abundant in all three kingdom of life and viruses, which speculate their importance (Gadhave et al. 2020; Garg et al. 2019; Giri et al. 2016, 2020; Singh et al. 2018a; Kumar et al. 2017, 2019, 2020a; Schad et al. 2011). The very interesting properties of these IDPs are their versatility of performing functions, which can be explained by the “fly casting mechanism” (Huang and Liu 2009; Shoemaker et al. 2000) (Fig. 13.1). Moreover, IDPs can perform function either in native disorder state or can bind to a partner to acquire folding state (Tompa 2005; Uversky and Dunker 2010, 2013; Tompa and Fuxreiter 2008; Dunker et al. 2002). This functional diversity of IDPs lies in its sequence heterogeneity, which allows it to bind with different partners and thus, different conformation and functions (Oldfield et al. 2008). Further, the IDPs/IDPRs possess larger surface area and structural flexibility. Due to the structural flexibility of IDPs they tend to expose peptide regions having molecular recognition features (MoRFs), which may fold while interacting with binding partners (Kumar et al. 2017, 2020a; Mohan et al. 2006; Oldfield et al. 2005; Uversky et al. 2005; Singh et al. 2018b; Mishra et al. 2018). A classic example of this IDPs-binding partner gaining multiple conformations can be illustrated with p53 C-terminal domain (p53-CTD). The p53 CTD (residue 374–388) bound to different partners and acquire different conformations viz., cyclin A (coil), sirtuin (sheet), CBP (coil), and S100bb (helix) (Uversky 2009; Fadda and Nixon 2017; Kannan et al. 2016).

Fig. 13.1
figure 1

The versatility of intrinsically disordered proteins

Besides the potential of IDPs/IDPRs of their multiple functions depending upon the binding partners, surrounding environment need to be considered. There are ample of IDPs which showed a change in conformation in the presence of varying pH, temperature, ions, detergent, organic solvent, crowding agents, and lipids (Kumar et al. 2020a; Uversky 2009; Lopes et al. 2013; Kjaergaard et al. 2010). The surrounding environment imparts electrostatic interaction, hydrophobic interaction, and osmophobic effect, which help IDPs to gain structural conformation (Uversky 2009).

13.3 IDPs in the Human Genome: Organizing Functions or Problems?

Despite being physiologically disordered, IDPs play crucial roles in biological activities. The abundance of IDPs in complex cellular organization displays its importance in regulatory processes (Uversky et al. 2008). These processes include molecular recognition, molecular assembly, entropic activities, and post-translational modifications. Various studies have reported the presence of IDPs in human regulatory proteins such as transcription factors and co-regulators. Eukaryotic proteins seem to use disorder for transient binding purposes (signaling and regulation), while prokaryotic proteins seem to use disorder for longer-lasting interactions, such as complex formation. A recent report suggested that functional misfolding can be induced by fugacious changes in protein environment, and structure can be reversed by restoring the environment or modifications. These induced nature and fugacious characters are important features of these IDPRs or conditionally disordered protein regions (Uversky 2015). Some interesting studies about the occurrence of IDPs in viruses have shown that it plays crucial roles in hijack of host cellular functional machinery (Gadhave et al. 2020; Garg et al. 2019; Giri et al. 2016, 2020; Xue et al. 2010; Kumar et al. 2020b).

The versatile nature of IDPs associated with folding, signaling, and many more, however, they are also implicated in many diseases. It is seen that selective mutations in IDPRs (i.e., amyloid β-peptide, α-synuclein, and huntingtin) may lead to structural complexity and enhanced aggregation propensity of these systems, which are associated with numerous neurodegenerative diseases (Babu et al. 2011; Uversky et al. 2008; Wu and Fuxreiter 2016). The IDPRs contain certain motif which are important for interaction and a slight change in these motif lead to altered cell signaling and thus cancer like diseases (Babu 2016; Hegyi et al. 2009; Colak et al. 2013).

According to the studies by disorder predictors, eukaryotic mammals are shown to contain nearly 75% of signaling proteins that contain long disordered regions with more than 30 residues, and about 25% of the predicted proteins are fully disordered in nature (Dunker et al. 2008a). Furthermore, eukaryotic proteins utilize the disorder for transient binding purposes (signaling and regulation), while prokaryotic proteins seem to use disorder for complex formation (Dunker et al. 2008b). Another example of varied length intrinsically disordered proteins are Transmembrane proteins that contain extracellular or cytosolic disorder regions (Uversky 2013; Xue et al. 2009). A total of 40% of human integral plasma proteins is predicted to contain long stretched disordered regions (Minezaki et al. 2007; Yang et al. 2008; De Biasio et al. 2008). Disordered regions usually bind to multiple targets with low affinity, which is an ideal condition for signal transduction (Dunker et al. 2002). Some recent findings mention the functioning of ordered proteins on a decrease in the percentage of their ordered structure and need partial or complete functional misfolding (Uversky 2015).

13.4 Characterization of IDP and IDPRs

In the past two decades, rapid progress in the exploration of IDPs have radically changed the understanding and importance of the field. The high occurrence of IDPs in a cellular organization has increased the demand for new perspectives in structural and functional studies. The conformational flexibility of IDPs did not allow it to accurately study with old structural techniques. Therefore, it appeals to introduce new methods to study functional aspects in IDPs (Habchi et al. 2014). The structural studies of IDPs have shed light on the critical aspect that disorder lies in the amino-acid sequence of a protein. The thorough studies of IDP sequences and structural information suggest that disordered regions show low hydrophobicity and higher net charge, and characterized by low hydrodynamic radius, high structural heterogeneity, and poor secondary structure organization (Uversky 2019). But it may have a tendency to gain structural regions in presence of natural ligands. On the basis of these structural and sequence-based data, various algorithms have been designed to predict the structural disorder propensity of protein regions using disorder predictors. These bioinformatics disorder prediction tools are commonly used to characterize the protein disorderedness. The higher proportions of hydrophilic stretch of sequences are analyzed by a web server such as DISPROT (Megan Sickmeier et al. 2007), IUPred (Zsuzsanna Dosztányi et al. 2005), PONDR (Obradovic et al. 2003), PrDOS (Ishida and Kinoshita 2007), D2P2 (Oates et al. 2013) and ESpritz (Walsh et al. 2012), which indicates the higher probability of disorder of those regions. Some of the test sets for structural predictions have been further confirmed by various experimental tools such as NMR, X-ray studies (Konrat 2014; Brutscher et al. 2015). This represents the high reliability of disorder predictors and improves the knowledge of the functional relevance of IDPs and IDPRs in various organisms.

13.5 Molecular Dynamics Simulations: Relevance with Structure Biology

The three-dimensional (3D) structure of biological macromolecules (e.g., proteins) or chemically synthesized polymers are essential for structural biology and applications in drug discovery. These days, the structure elucidation through X-ray, NMR, Cryo-EM techniques have been an advantage to understand their Structure-Function-Paradigm. However, there are many proteins which can not form rigid three-dimensional structures. So their thermodynamic properties, microscopic energies, and specific interaction with other molecules at the atomic-level cannot be understood well through experimental methods (Chong et al. 2017). Therefore, the characterization of proteins at the atomic level is more feasible through atomistic computational simulations rather than experiments. Molecular dynamics (MD) simulations are capable of determining conformational dynamics, structure compositions, and organization of proteins in an aqueous environment (Hollingsworth and Dror 2018). Additionally, the interaction of proteins with lipid molecules, inhibitors, with partner proteins, etc. can also be determined through MD simulations. Due to advancements in computer hardware, it is now possible to explore such macromolecules in a deeper level for longer timescale up to seconds to meet the experimental observations (Perilla et al. 2015). Various simulation packages such as Desmond (Bowers et al. 2006), Gromacs (Berendsen et al. 1995), Amber (Ponder and Case 2003), NAMD (Phillips et al. 2005), etc. are available with different optimized forcefields. Generally, a Forcefield (FF) can be explained as the interatomic potential energy of a system, which is calculated along with several parameters such as bonds, angle, torsion, dihedral, etc., defined on the atomic coordinates (Jorgensen and Tirado-Rives 2005; González 2011). The atoms which are held together by simple harmonic or elastic forces represent a molecule within the specified region for simulation (González 2011). Also, van der waal interactions and electrostatic interactions are the integral constituents of a forcefield. The overall equation that defines a forcefield is,

$$ U\left( potential\ energy\right)=\sum \limits_{bonds}\frac{1}{2}{k}_b{\left(r-{r}_0\right)}^2+\sum \limits_{angles}\frac{1}{2}{k}_a{\left(\theta -{\theta}_0\right)}^2+\sum \limits_{torsions}\frac{V_n}{2}\left[1+\cos \left(n\varnothing -\delta \right)\right]+\sum \limits_{imp roper}{V}_{imp}+\sum \limits_{LJ}4{\epsilon}_{ij}\ \left(\frac{\sigma_{ij}^{12}}{r_{ij}^{12}}-\frac{\sigma_{ij}^6}{r_{ij}^6}\right)+\sum \limits_{electrostatic}\frac{q_i{q}_j}{r_{ij}} $$

Where, oscillations about equilibrium bond length, bond angle, torsional rotation of 4 atoms about a central bond, and nonbonded energy terms (electrostatics and Lenard-Jones (LJ)) are summed up for calculating potential energy.

13.6 MD Force Fields and Their Role in Conformation Dynamics

Several forcefields have been developed to date and are being used with different purposes for investigation in almost every field of science. For biological macromolecules, OPLS (AA) (Jorgensen and Tirado-Rives 1988), GROMOS (Berendsen et al. 1995), CHARMM (Vanommeslaeghe et al. 2009), AMBER (Ponder and Case 2003), Drude (Li et al. 2017) forcefields are optimized to deduce the conformational change, structural composition, protein aggregation, binding efficacy with respect to time in a given environment. All these forcefields have a different level of tendency to estimate structural composition. However, for disordered proteins, it is essential to be picky for selection of accurate forcefields. A wonderful comparison has been made by Ham and colleagues between GROMOS, CHARMM, AMBER, and OPLS forcefields for correct selection of forcefield to perform MD of an IDP. Among all of them, OPLS-AA has proper balanced tendency to evaluate the helical and beta property of protein (Chong et al. 2017). Also, for IDPs, OPLS, and a recently introduced CHARMM36 (Huang et al. 2016) are used to simulate disordered regions properly, which allow them to gain a proper helical or beta structure, if induces. Two IDP models amyloid-beta and p53, have been extensively used as model systems for testing of different forcefields and correlating them with experiments. Pacheco and Strodel have investigated the accuracy of five forcefields on amyloid-beta (Aβ; one of the responsible protein for Alzheimer’s Disease) where CHARMM22, OPLS, Amber’s 99sb, 99sb-ildn, had high accuracy with the NMR results than 99sbildn-NMR forcefield (Carballo-Pacheco and Strodel 2017). Recently, D.E. Shaw and colleagues have modified Amber-ff99sb forcefield for disordered proteins which correlate well with experimental observations. The improved forcefield ff99sb-disp accurately calculates the transition states between ordered and disordered states (Robustelli et al. 2018). Additionally, a short disordered peptide of 24 amino acids, Histatin 5, was investigated through verities of Amber and Gromos forcefields and compared with experiments (Henriques et al. 2015).

Along with a forcefield, the selection of water model for simulation is essential for precise evaluation of interaction and behavior of a protein in an aqueous environment. Water models are defined based on the interaction sites, which is centered on the nuclei of water molecule. Most commonly, water models are SPC (simple point charge; with HOH angle 109.47°), TIP3P (104.5°), and TIP4P (104.5°) for protein simulations. TIP3P and TIP4P water models are basically based on transferrable interaction potentials (TIPS) with three- and four-point charge (Ouyang and Bettens 2015). Three-point charge has three interaction sites as water molecule has three atoms, while four-point charge has an additional dummy atom to improve the electrostatic distribution (Jorgensen and Tirado-Rives 2005). Moreover, five-point and six-point charge water models are also available, which has dummy atoms representing the lone pairs and one extra site for interaction (Fig. 13.2). These water models are placed around the protein structure, which is centered in a simulation box (e.g. cubic, dodecahedron, etc.) with a defined size in the periodic cell.

Fig. 13.2
figure 2

Representation of water models based on different interaction sites. Here, O and H are oxygen and Hydrogen atoms while M represents the dummy atom in water models with 4, 5, and 6 interactions sites

13.7 MD Simulation Terminology: Structural Conformation Assessment

13.7.1 Energy Minimization, Equilibration, and Timescale

Building a simulation system to final production run, there are a number of steps for successfully trajectory generation. Before simulating a protein, the minimized structural conformation is vital as the system might produce erroneous results due to excess heat caused by unwanted and huge forces without minimization (Mackay et al. 1989). For simulating IDPs, a very popular method, steepest descent method is used for certain steps till it converged under required energy for the system.

Afterward, the simulation setup is subjected to equilibration as the minimized system has unoptimized solvents (e.g., water) around the proteins. Generally, the system is equilibrated under two conditions in constant temperature and pressure because for simulating a protein at a temperature, the system needs to be in proper positions without any unrestrained atoms. Two ensembles NPT and NVT are commonly used for equilibration in which the number of atoms (N), pressure (P), temperature (T), and volume (V) are kept constant and the system is processed for small simulation run upto few picoseconds or nanoseconds or till the system is equilibrated.

Finally, the production run for all-atom classical MD is performed at a constant temperature and pressure, which are maintained by specific thermostats and barostats, respectively. Thermostats like Nose Hoover, Berendsen, and Langevin while barostats such as Berendsen (Berendsen et al. 1984), Martyna-Tobias-Klein (MTK) (Lippert et al. 2013), Parrinello Rahman (Parrinello and Rahman 1980, 1981) are used as per the simulation setup requirements. In the case of IDPs, Nose Hoover (Posch et al. 1986) and Berendsen thermostats are preferred, and barostats MTK or Parrinello Rahman are considered for controlling temperature and pressure at an average value.

The simulation time is another important aspect to be looked carefully. As shown in above Table 13.1, the purpose of performing any simulation should be accomplished successfully within an adequate range of time. This timescale also depends on the number of atoms in a simulation setup and computer hardware. More number of atoms require more time and vice-versa, therefore, all atoms MD demands time from nanoseconds to few microseconds. However, apart from all-atoms MD simulation, Coarse-grain (CG) MD is one of the examples where a group of atoms are considered as a single bead, which reduces the degree of freedom among the atoms and allow the system for longer simulation in comparatively less time (Kmiecik et al. 2016).

Table 13.1 Tabulation of the timescale of different simulation techniques based on their applications

13.7.2 RMSD, RMSF and Radius of Gyration

For the conformational analysis from the simulation trajectory, Root mean square deviation (RMSD) is the most critical parameter to be investigated. RMSD is measured between two sets of atoms (backbone, c-alpha, heavy, side-chain atoms) at a given time with reference to the initial one or the desired reference set of atoms (Kuzmanic and Zagrovic 2010). Similarly, the fluctuation in residues with respect to time can also be calculated as average of all simulated frames in the trajectory, it is known as Root mean square fluctuation (Martínez 2015). Both these values deduce the conformational change of protein in the simulation environment over the time course. Another parameter, Radius of gyration (RoG) states the compactness of a protein structure. A well-folded or tightly packed structure will have lowest RoG value and vice-versa.

Many IDPs, which tend to get converted into an ordered form upon interaction with some physiological partners, are also vulnerable to different forcefields. In last two decades, the computer hardware power has been increased gradually and consequently, MD simulations have been very effective to explore the folding and unfolding of a protein in the presence of different conditions such as mixture of multiple solvents, membranes, ions, etc. at a constant or varying temperature and pH ranges. Also, various categories of simulation have made it easier to investigate the structural dynamics for longer timescale. A method where conformational swapping occurs between a number of MD replicas to obtain a conformation with minimum energy. These are known as Replica Exchange (RE) MD in which multiple replicas run simultaneously which are formed based on temperatures (lower to higher), required after selection through literature or experimental evidences (Sugita and Okamoto 1999). In next section, we have discussed the REMD with a well-suited example of p53, a tumor suppressor gene.

13.8 IDPs and Replica Exchange MD: In Perspective of p53-CTD

In our recent study, we performed a Replica exchange molecular dynamics simulation on p53-CTD using OPLS 2005 forcefield, embedded in Desmond simulation package (Bowers et al. 2006). As aforementioned, OPLS forcefield has a proper balance for alpha and beta propensity estimation in simulation. As illustrated in this study, the temperature induces changes in structural conformation of IDP (p53-CTD), which showed it’s highly dynamics/flexible nature (Kumar et al. 2020a). The hydrophobic and electrostatic interactions play an important role in structural conformation of p53-CTD. The circular dichroism studies showed that the higher temperature leads to the compaction of a peptide, which is associated with the helical structural conformations (Kumar et al. 2020a; Kjaergaard et al. 2010). Previously, NMR studies showed that the temperature-induced structural conformation is associated with the random stretch of amino acid (non-helical) in a peptide (Kjaergaard et al. 2010). Our result also corroborated with theses finding where MD simulation showed that the random stretch of amino acid or non-helical regions is responsible for change in structural transformations.

The p53-CTD adopts random coil conformations and have a tendency to gain structural conformation. According to REMD analysis, the highest structural compaction occurred at 80 °C where two major helical regions were formed (Fig. 13.3). The total potential energy of p53-CTD in aqueous system was calculated to be more negative (−33914.25 kcal/mol) at high temperature (353.71 K) while at 300 K, the potential energy of the system was −30382.4 kcal/mol. The p53 CTD could not fold spontaneously due to high intrinsic free energy. In the presence of binding partners, the p53-CTD possesses the minimum free energy and minimum binding energy from partner. Thus, the competitive effect of energy minimization results in different characteristic states, where each mechanism determines the sampling frequency of each characteristic state (Han et al. 2017).

Fig. 13.3
figure 3

Induction of helicity at two regions in p53-CTD at high temperature as obtained from Replica Exchange MD, using OPLS 2005 forcefield

13.9 Future Prospects

In general, IDPs are very challenging to study at atomistic detail with current computational simulation forcefields to achieve its accuracy with experimental models. A great advancement has been made to counter these challenges, but still, a lot of improvement needs to be done. However, Bioinformatics has made it possible to a large extent, and specifically, molecular dynamics simulations are being used extensively to explore IDPs at atomic level. The exact evaluation of structural properties of IDPs/IDPRs have been very distinguished among different forcefield. In this context, it can be seen as a high opportunity for development of new FF or improvement of current FF, which unravel the atomistic details on IDPs’ conformational dynamics with respect to experimental measures. The outcome will certainly lead to a better understanding of biophysics of IDPs and pave a potential role in drug discovery.