Abstract
In the protein folding problem, conventional simulations in physical statistical mechanical ensembles, such as the canonical ensemble with fixed temperature, face a great difficulty. This is because there exist a huge number of local-minimum-energy states in the system and the conventional simulations tend to get trapped in these states, giving wrong results. Generalized-ensemble algorithms are based on artificial unphysical ensembles and overcome the above difficulty by performing random walks in potential energy, volume, and other physical quantities or their corresponding conjugate parameters such as temperature, pressure, etc. The advantage of generalized-ensemble simulations lies in the fact that they not only avoid getting trapped in states of energy local minima but also allows the calculations of physical quantities as functions of temperature or other parameters from a single simulation run. In this article we review the generalized-ensemble algorithms. Four examples, multicanonical algorithm, replica-exchange method, replica-exchange multicanonical algorithm, and multicanonical replica-exchange method, are described in detail. Examples of their applications to the protein folding problem are presented.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Generalized-ensemble algorithm
- Multicanonical algorithm
- Replica-exchange molecular dynamics
- Replica-exchange multicanonical algorithm
- Multicanonical replica-exchange method
- Protein folding
1.1 Introduction
In order to study the protein folding problem, molecular simulation methods such as Monte Carlo (MC) and molecular dynamics (MD) are often used. However, conventional canonical simulations at physically relevant temperatures tend to get trapped in states of energy-local-minima, giving wrong results. A class of simulation methods, which are referred to as the generalized-ensemble algorithms, overcome this difficulty (for reviews see, e.g., Refs. [1–5]). In the generalized-ensemble algorithm, each state is weighted by an artificial, non-Boltzmann probability weight factor so that random walks in potential energy, volume, and other physical quantities or their corresponding conjugate parameters such as temperature, pressure, etc. may be realized. The random walks allow the simulation to escape from any energy barrier and to sample much wider conformational space than by conventional methods.
One of effective generalized-ensemble algorithms for molecular simulations is the multicanonical algorithm (MUCA) [6, 7], which was first applied to the protein folding problem in Ref. [8]. In this method, the weight factor is defined to be inversely proportional to the density of states and a free random walk in potential energy space is realized. Another effective generalized-ensemble algorithm is the replica-exchange method (REM) [9] (the method is also referred to as parallel tempering [10]), which was first applied to the protein folding problem in Ref. [11]. In this method, a number of non-interacting copies (or, replicas) of the original system at different temperatures are simulated independently and exchanged with a specified transition probability. The details of molecular dynamics algorithm for REM, which is referred to as the replica-exchange molecular dynamics (REMD), have been worked out in Ref. [12], and this led to a wide application of REMD in the protein and other biomolecular systems. One is naturally led to combine MUCA and REM, and two methods, replica-exchange multicanonical algorithm (REMUCA) and multicanonical replica-exchange method (MUCAREM), have been developed [13–15]. MUCAREM can be considered to be a special case of the multidimensional (or, multivariable) extension of REM, which we refer to as the multidimensional replica-exchange method (MREM) [16]. MREM is now widely used and often referred to as Hamiltonian replica-exchange method [17].
In this article, we describe the generalized-ensemble algorithms mentioned above. Namely, we review the four methods: MUCA, REM, REUMCA, and MUCAREM. Examples of the results in which these methods were applied to the protein folding problem are then presented.
1.2 Methods
1.2.1 Multicanonical Algorithm
Let us consider a system of N atoms of mass m k (k = 1,…, N) with their coordinate vectors and momentum vectors denoted by q = (q 1,…,q N ) and p = (p 1, …, p N ), respectively. The Hamiltonian H(q,p) of the system is the sum of the kinetic energy K(p) and the potential energy E(q):
where
In the canonical ensemble at temperature T each state x ≡ (q,p) with the Hamiltonian H(q,p) is weighted by the Boltzmann factor:
where the inverse temperature β is defined by β = 1/k B T (k B is the Boltzmann constant). The average kinetic energy at temperature T is then given by
Because the coordinates q and momenta p are decoupled in Eq. (1.1), we can suppress the kinetic energy part and can write the Boltzmann factor as
The canonical probability distribution of potential energy P NVT(E;T) is then given by the product of the density of states n(E) and the Boltzmann weight factor W B(E;T):
Because n(E) is a rapidly increasing function and the Boltzmann factor decreases exponentially, the canonical ensemble yields a bell-shaped distribution of potential energy which has a maximum around the average energy at temperature T. The conventional MC or MD simulations at constant temperature are expected to yield P NVT(E;T). A MC simulation based on the Metropolis algorithm [18] is performed with the following transition probability from a state x of potential energy E to a state x′ of potential energy E′:
where
A MD simulation, on the other hand, is based on the following Newton equations of motion:
where f k is the force acting on the k-th atom (k = 1, …, N). This set of equations actually yield the microcanonical ensemble, however, and we have to add a thermostat in order to obtain the canonical ensemble at temperature T. Here, we just follow Nosé’s prescription [19, 20], and we have
where s is Nosé’s scaling parameter, P s is its conjugate momentum, Q is its mass, and the “instantaneous temperature” T(t) is defined by
However, in practice, it is very difficult to obtain accurate canonical distributions of complex systems at low temperatures by conventional MC or MD simulation methods. This is because simulations at low temperatures tend to get trapped in one or a few of local-minimum-energy states. This difficulty is overcome by, for instance, the generalized-ensemble algorithms, which greatly enhance conformational sampling.
In the multicanonical ensemble [6, 7], on the other hand, each state is weighted by a non-Boltzmann weight factor W MUCA(E) (which we refer to as the multicanonical weight factor) so that a uniform potential energy distribution P MUCA(E) is obtained:
The flat distribution implies that a free random walk in the potential energy space is realized in this ensemble. This allows the simulation to escape from any local minimum-energy states and to sample the configurational space much more widely than the conventional canonical MC or MD methods.
The definition in Eq. (1.16) implies that the multicanonical weight factor is inversely proportional to the density of states, and we can write it as follows:
where we have chosen an arbitrary reference temperature, T 0 = 1/k B β 0, and the “multicanonical potential energy” is defined by
Here, S(E) is the entropy in the microcanonical ensemble. Because the density of states of the system is usually unknown, the multicanonical weight factor has to be determined numerically by iterations of short preliminary runs [6, 7].
A multicanonical MC simulation is performed, for instance, with the usual Metropolis criterion [18]: The transition probability of state x with potential energy E to state x′ with potential energy E′ is given by
where
The MD algorithm in the multicanonical ensemble also naturally follows from Eq. (1.17), in which the regular constant temperature MD simulation (with T = T 0) is performed by replacing E by E MUCA in Eq. (1.12) [21, 22]:
From Eq. (1.18) this equation can be rewritten as
where the following thermodynamic relation gives the definition of the “effective temperature” T(E):
with
If the exact multicanonical weight factor W MUCA(E) is known, one can calculate the ensemble averages of any physical quantity A at any temperature T (= 1/k B β) as follows:
where the density of states is given by (see Eq. (1.17))
The summation instead of integration is used in Eq. (1.25), because we often discretize the potential energy E with step size ε (E = E i; i = 1, 2, …). Here, the explicit form of the physical quantity A should be known as a function of potential energy E. For instance, A(E) = E gives the average potential energy <E> T as a function of temperature, and A(E) = β2 (E−<E>T)2 gives specific heat.
In general, the multicanonical weight factor W MUCA(E), or the density of states n(E), is not a priori known, and one needs its estimator for a numerical simulation. This estimator is usually obtained from iterations of short trial multicanonical simulations. However, the iterative process can be non-trivial and very tedious for complex systems.
In practice, it is impossible to obtain the ideal multicanonical weight factor with completely uniform potential energy distribution. The question is when to stop the iteration for the weight factor determination. Our criterion for a satisfactory weight factor is that as long as we do get a random walk in potential energy space, the probability distribution P MUCA(E) does not have to be completely flat with a tolerance of, say, an order of magnitude deviation. In such a case, we usually perform with this weight factor a multicanonical simulation with high statistics (production run) in order to get even better estimate of the density of states. Let N MUCA(E) be the histogram of potential energy distribution P MUCA(E) obtained by this production run. The best estimate of the density of states can then be given by the single-histogram reweighting techniques [23] as follows (see the proportionality relation in Eq. (1.16)):
By substituting this quantity into Eq. (1.25), one can calculate ensemble averages of physical quantity A(E) as a function of temperature. Moreover, ensemble averages of any physical quantity A (including those that cannot be expressed as functions of potential energy) at any temperature T (=1/k B β) can now be obtained as long as one stores the “trajectory” of configurations (and A) from the production run. Namely, we have
where x(k) is the configuration at the k-th MC (or MD) step and n 0 is the total number of configurations stored. Note that when A is a function of E, Eq. (1.28) reduces to Eq. (1.25) where the density of states is given by Eq. (1.27).
Equations (1.25) and (1.28) or any other equations which involve summations of exponential functions often encounter with numerical difficulties such as overflows. These can be overcome by using, for instance, the following equation [24, 25]: For C = A + B (with A > 0 and B > 0) we have
1.2.2 Replica-Exchange Method
The replica-exchange method (REM) is another effective generalized-ensemble algorithm. The system for REM consists of M non-interacting copies (or, replicas) of the original system in the canonical ensemble at M different temperatures T m (m = 1, …, M). We arrange the replicas so that there is always exactly one replica at each temperature. Then there exists a one-to-one correspondence between replicas and temperatures; the label i(=1, …, M) for replicas is a permutation of the label m(=1, …, M) for temperatures, and vice versa:
where f(m) is a permutation function of m and f − 1(i) is its inverse.
Let X = {x [i(1)]1 , …,x [i(M)] M } = {x [1] m(1) , …,x [M] m(M) } stand for a “state” in this generalized ensemble. Each “substate” x [i] m is specified by the coordinates q [i] and momenta p [i] of N atoms in replica i at temperature T m :
Because the replicas are non-interacting, the weight factor for the state X in this generalized ensemble is given by the product of Boltzmann factors for each replica (or at each temperature):
where i(m) and m(i) are the permutation functions in Eq. (1.30).
We now consider exchanging a pair of replicas in this ensemble. Suppose we exchange replicas i and j which are at temperatures T m and T n , respectively:
The exchange of replicas can be written in more detail as
where the definitions for p [i]^\prime and p [j]^\prime will be given below.
In the original implementation of the replica-exchange method (REM) [9], Monte Carlo algorithm was used, and only the coordinates q (and the potential energy function E(q)) had to be taken into account. In molecular dynamics algorithm, on the other hand, we also have to deal with the momenta p. We proposed the following momentum assignment in Eq. (1.34) [12]:
which we believe is the simplest and the most natural. This assignment means that we just rescale uniformly the velocities of all the atoms in the replicas by the square root of the ratio of the two temperatures so that the temperature condition in Eq. (1.4) may be satisfied immediately after replica exchange is accepted. We remark that similar momentum rescaling formulae for various constant-temperature algorithms have been worked out in Ref. [26].
The transition probability of this replica-exchange process is given by the usual Metropolis criterion:
where in the second expression (i.e., w(x [i] m |x [j] n )) we explicitly wrote the pair of replicas (and temperatures) to be exchanged. From Eqs. (1.1), (1.2), (1.32), and (1.35), we have
Note that after introducing the momentum rescaling in Eq. (1.35), we have the same Metropolis criterion for replica exchanges, i.e., Eqs. (1.36) and (1.38), for both MC and MD versions.
Without loss of generality we can assume that T 1 < T 2 < … < T M . The lowest temperature T 1 should be sufficiently low so that the simulation can explore the experimentally relevant temperature region, and the highest temperature T M should be sufficiently high so that no trapping in an energy-local-minimum state occurs. A REM simulation is then realized by alternately performing the following two steps:
-
1.
Each replica in canonical ensemble of the fixed temperature is simulated simultaneously and independently for a certain MC or MD steps.
-
2.
A pair of replicas at neighboring temperatures, say, x [i] m and x [j] m + 1 , are exchanged with the probability w x [i] m |x [j] m + 1 in Eq. (1.36).
A random walk in “temperature space” is realized for each replica, which in turn induces a random walk in potential energy space. This alleviates the problem of getting trapped in states of energy local minima.
After a long production run of a replica-exchange simulation, the canonical expectation value of a physical quantity A at temperature T m (m = 1, …, M) can be calculated by the usual arithmetic mean:
where x m (k)(k = 1, …, n m ) are the configurations obtained at temperature T m and n m is the total number of measurements made at T = T m . The expectation value at any intermediate temperature T (= 1/k B β) can also be obtained from Eq. (1.25), where the density of states n(E) in Eq. (1.25) is now given by the multiple-histogram reweighting techniques, or, the weighted histogram analysis method (WHAM) [27, 28] as follows. Let N m (E) and n m be respectively the potential-energy histogram and the total number of samples obtained at temperature T m = 1/k B β m (m = 1, …, M). The best estimate of the density of states is then given by
where we have for each m(=1, …, M)
Note that Eqs. (1.40) and (1.41) are solved self-consistently by iteration [27, 28] to obtain the density of states n(E) and the dimensionless Helmholtz free energy f m . Namely, we can set all the f m (m = 1, …, M) to, e.g., zero initially. We then use Eq. (1.40) to obtain n(E), which is substituted into Eq. (1.41) to obtain next values of f m , and so on.
Moreover, ensemble averages of any physical quantity A (including those that cannot be expressed as functions of potential energy) at any temperature T (= 1/k B β) can now be obtained from the “trajectory” of configurations of the production run. Namely, we first obtain f m (m = 1, …, M) by solving Eqs. (1.40) and (1.41) self-consistently, and then we have [14]
where x m (k)(k = 1, …, n m ) are the configurations obtained at temperature T m .
1.2.3 Replica-Exchange Multicanonical Algorithm and Multicanonical Replica-Exchange Method
The replica-exchange multicanonical algorithm (REMUCA) [13–15] overcomes both the difficulties of MUCA (the multicanonical weight factor determination is non-trivial) and REM (a lot of replicas, or computation time, is required). In REMUCA we first perform a short REM simulation (with M replicas) to determine the multicanonical weight factor and then perform with this weight factor a regular multicanonical simulation with high statistics. The first step is accomplished by the multiple-histogram reweighting techniques. Let N m (E) and n m be respectively the potential-energy histogram and the total number of samples obtained at temperature T m (= 1/k B β m ) of the REM run. The density of states n(E) is then given by solving Eqs. (1.40) and (1.41) self-consistently by iteration.
Once the estimate of the density of states is obtained, the multicanonical weight factor can be directly determined from Eq. (1.17) (see also Eq. (1.18)). Actually, the density of states n(E) and the multicanonical potential energy, E MUCA(E;T 0), thus determined are only reliable in the following range:
where
and T 1 and T M are respectively the lowest and the highest temperatures used in the REM run. Outside this range we extrapolate the multicanonical potential energy linearly [13]:
The multicanonical MC and MD runs are then performed respectively with the Metropolis criterion of Eq. (1.19) and with the modified Newton equation in Eq. (1.21), in which \( {\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{0\right\}}(E) \) in Eq. (1.45) is substituted into E MUCA(E;T 0). We expect to obtain a flat potential energy distribution in the range of Eq. (1.43). Finally, the results are analyzed by the single-histogram reweighting techniques as described in Eq. (1.27) (and Eq. (1.25)).
Some remarks are now in order. From Eqs. (1.18), (1.23), (1.24), and (1.44), Eq. (1.45) becomes
The Newton equation in Eq. (1.21) is then written as (see Eqs. (1.22), (1.23), and (1.24))
Because only the product of inverse temperature β and potential energy E enters in the Boltzmann factor (see Eq. (1.5)), a rescaling of the potential energy (or force) by a constant, say α, can be considered as the rescaling of the temperature by 1/α [21]. Hence, our choice of \( {\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{0\right\}}(E) \) in Eq. (1.45) results in a canonical simulation at T = T 1 for E < E 1, a multicanonical simulation for E 1 ≤ E ≤ E M , and a canonical simulation at T = T M for E > E M . Note also that the above arguments are independent of the value of T 0, and we will get the same results, regardless of its value.
For Monte Carlo method, the above statement follows directly from the following equation. Namely, our choice of the multicanonical potential energy in Eq. (1.45) gives (by substituting Eq. (1.46) into Eq. (1.17))
We now present the multicanonical replica-exchange method (MUCAREM) [13–15]. In MUCAREM the production run is a REM simulation with a few replicas not in the canonical ensemble but in the multicanonical ensemble, i.e., different replicas perform MUCA simulations with different energy ranges. While MUCA simulations are usually based on local updates, a replica-exchange process can be considered to be a global update, and global updates enhance the sampling further.
Let \( \mathbf{\mathcal{M}} \) be the number of replicas for a MUCAREM simulation. Here, each replica is in one-to-one correspondence not with temperature but with multicanonical weight factors of different energy range. Note that because multicanonical simulations cover much wider energy ranges than regular canonical simulations, the number of required replicas for the production run of MUCAREM is much less than that for the regular REM (\( \mathbf{\mathcal{M}}\ll M \)). The weight factor for this generalized ensemble is now given by (see Eq. (1.32))
where we prepare the multicanonical weight factor (and the density of states) separately for \( \mathbf{\mathcal{M}} \) regions (see Eq. (1.17)):
Here, we have introduced \( \mathbf{\mathcal{M}} \) arbitrary reference temperatures T m (= 1/k B β m ) \( \left(m=1,\dots, \mathbf{\mathcal{M}}\right) \), but the final results will be independent of the values of T m , as one can see from the second equality in Eq. (1.50) (these arbitrary temperatures are necessary only for MD simulations).
Each multicanonical weight factor W {m}MUCA (E), or the density of states n {m}(E), is defined as follows. For each \( m\left(m=1,\dots, \mathbf{\mathcal{M}}\right) \), we assign a pair of temperatures (T {m}L ,T {m}H ). Here, we assume that T {m}L < T {m}H and arrange the temperatures so that the neighboring regions covered by the pairs have sufficient overlaps. Without loss of generality we can assume \( {T}_{{\rm L}}^{\left\{1\right\}}<\dots <{T}_{{\rm L}}^{\left\{\mathbf{\mathcal{M}}\right\}} \) and \( {T}_{{\rm H}}^{\left\{1\right\}}<\dots <{T}_{{\rm H}}^{\left\{\mathbf{\mathcal{M}}\right\}} \). We define the following quantities:
Suppose that the multicanonical weight factor W MUCA(E) (or equivalently, the multicanonical potential energy E MUCA(E;T 0) in Eq. (1.18)) has been obtained as in REMUCA or by any other methods in the entire energy range of interest (\( {E}_{{\rm L}}^{\left\{1\right\}}<E<{E}_{{\rm H}}^{\left\{\mathbf{\mathcal{M}}\right\}} \)). We then have for each \( m\left(m=1,\dots, \mathbf{\mathcal{M}}\right) \) the following multicanonical potential energies (see Eq. (1.45)) [13]:
Finally, a MUCAREM simulation is realized by alternately performing the following two steps.
-
1.
Each replica of the fixed multicanonical ensemble is simulated simultaneously and independently for a certain MC or MD steps.
-
2.
A pair of replicas, say i and j, which are in neighboring multicanonical ensembles, say m-th and (m + 1)-th, respectively, are exchanged:
$$ X=\left\{\dots, {x}_m^{\left[i\right]},\dots, {x}_{m+1}^{\left[j\right]},\dots \right\}\to {X}^{\prime }=\left\{\dots, {x}_m^{\left[j\right]},\dots, {x}_{m+1}^{\left[i\right]},\dots \right\}. $$(1.53)The transition probability of this replica exchange is given by the Metropolis criterion:
$$ w\left(X\to {X}^{\prime}\right)= \min \left(1, \exp \left(-\varDelta \right)\right), $$(1.54)where we now have (see Eq. (1.37)) [13]
$$ \begin{aligned}[b] \varDelta & = {\beta}_m\left\{{\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m\right\}}\left(E\left({q}^{\left[j\right]}\right)\right)-{\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m\right\}}\left(E\left({q}^{\left[i\right]}\right)\right)\right\}\\[6pt] &\quad -{\beta}_{m+1}\left\{{\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m+1\right\}}\left(E\left({q}^{\left[j\right]}\right)\right)-{\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m+1\right\}}\left(E\left({q}^{\left[i\right]}\right)\right)\right\}. \end{aligned} $$(1.55)Here,E(q [i]) and E(q [j]) are the potential energy of the i-th replica and the j-th replica, respectively.
Note that in Eq. (1.55) we need to newly evaluate the multicanonical potential energy, \( {\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m\right\}}\left(E\left({q}^{\left[j\right]}\right)\right) \) and \( {\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m+1\right\}}\left(E\left({q}^{\left[i\right]}\right)\right) \), because \( {\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{m\right\}}(E) \) and \( {\mathbf{\mathcal{E}}}_{{\rm MUCA}}^{\left\{n\right\}}(E) \) are, in general, different functions for m ≠ n.
In this algorithm, the m-th multicanonical ensemble actually results in a canonical simulation at T = T {m}L for E < E {m}L , a multicanonical simulation for E {m}L ≤ E ≤ E {m}H , and a canonical simulation at T = T {m}H for E > E {m}H , while the replica-exchange process samples states of the whole energy range (\( {E}_{{\rm L}}^{\left\{1\right\}}\le E\le {E}_{{\rm H}}^{\left\{\mathbf{\mathcal{M}}\right\}} \)).
For obtaining the canonical distributions at any intermediate temperature T, the multiple-histogram reweighting techniques are again used. Let N m (E) and n m be respectively the potential-energy histogram and the total number of samples obtained with the multicanonical weight factor W {m}MUCA (E) \( \left(m=1,\dots, \mathbf{\mathcal{M}}\right) \). The expectation value of a physical quantity A at any temperature T (= 1/k B β) is then obtained from Eq. (1.25), where the best estimate of the density of states is obtained by solving the WHAM equations, which now read [13]
where we have for each \( m\left(=1,\dots, \mathbf{\mathcal{M}}\right) \)
Note that W {m}MUCA (E) is used instead of the Boltzmann factor exp(−β m E) in Eqs. (1.40) and (1.41).
Moreover, ensemble averages of any physical quantity A (including those that cannot be expressed as functions of potential energy) at any temperature T (= 1/k B β) can now be obtained from the “trajectory” of configurations of the production run. Namely, we first obtain \( {f}_m\left(m=1,\dots, \mathbf{\mathcal{M}}\right) \) by solving Eqs. (1.56) and (1.57) self-consistently, and then we have [14]
where the trajectories x m (k)(k = 1, …, n m ) are taken from each multicanonical simulation with the multicanonical weight factor W {m}MUCA (E) \( \left(m=1,\dots, \mathbf{\mathcal{M}}\right) \) separately.
As seen above, both REMUCA and MUCAREM can be used to obtain the multicanonical weight factor, or the density of states, for the entire potential energy range of interest. For complex systems, however, a single REMUCA or MUCAREM simulation is often insufficient. In such cases we can iterate MUCA (in REMUCA) and/or MUCAREM simulations in which the estimate of the multicanonical weight factor is updated by the single- and/or multiple-histogram reweighting techniques, respectively.
To be more specific, this iterative process can be summarized as follows. The REMUCA production run corresponds to a MUCA simulation with the weight factor W MUCA(E). The new estimate of the density of states can be obtained by the single-histogram reweighting techniques of Eq. (1.27). On the other hand, from the MUCAREM production run, the improved density of states can be obtained by the multiple-histogram reweighting techniques of Eqs. (1.56) and (1.57).
The improved density of states thus obtained leads to a new multicanonical weight factor (see Eq. (1.17)). The next iteration can be either a MUCA production run (as in REMUCA) or MUCAREM production run. The results of this production run may yield an optimal multicanonical weight factor that yields a sufficiently flat energy distribution for the entire energy range of interest. If not, we can repeat the above process by obtaining the third estimate of the multicanonical weight factor either by a MUCA production run (as in REMUCA) or by a MUCAREM production run, and so on.
We remark that as the estimate of the multicanonical weight factor becomes more accurate, one is required to have a less number of replicas for a successful MUCAREM simulation, because each replica will have a flat energy distribution for a wider energy range. Hence, for a large, complex system, it is often more efficient to first try MUCAREM and iteratively reduce the number of replicas so that eventually one needs only one or a few replicas (instead of trying REMUCA directly from the beginning and iterating MUCA simulations).
1.3 Simulation Results
We now present some examples of the simulation results by the algorithms described in the previous section. The computer code developed in Refs. [12, 13, 29, 30], which is based on the version 2 of PRESTO [31], was used after modifications that were necessary for each calculation.
The first example is the C-peptide of ribonuclease A in explicit water [32]. The N-terminus and the C-terminus of the C-peptide analogue were blocked with the acetyl group and the N-methyl group, respectively. The number of amino acids is 13 and the amino-acid sequence is: Ace-Ala-Glu−-Thr-Ala-Ala-Ala-Lys+-Phe-Leu-Arg+-Ala-His+-Ala-Nme [33, 34]. It is known by experiments that this peptide forms α-helix structures [33, 34]. The initial configuration of our simulation was first generated by a high temperature molecular dynamics simulation (at T = 1,000 K) in gas phase, starting from a fully extended conformation. We randomly selected one of the structures that do not have any secondary structures such as α-helix and β-sheet. The peptide was then solvated in a sphere of radius 22 Å, in which 1,387 water molecules were included (see Fig. 1.1). Harmonic restraint was applied to prevent the water molecules from going out of the sphere. The total number of atoms is 4,365. The dielectric constant was set equal to 1.0. The force-field parameters for protein were taken from the all-atom version of AMBER parm99 [35], which was found to be suitable for studying helical peptides [36, 37], and TIP3P model [38] was used for water molecules. The unit time step, Δt, was set to 0.5 fs. In Table 1.1 the parameter values in the simulations performed are summarized.
We first performed a REMD simulation with 32 replicas for 100 ps per replica (REMD1 in Table 1.1). During this REMD simulation, replica exchange was tried every 200 MD steps. Using the obtained potential-energy histogram of each replica as input data to the multiple-histogram analysis in Eqs. (1.40) and (1.41), we obtained the first estimate of the multicanonical weight factor, or the density of states. We divided this multicanonical weight factor into four multicanonical weight factors that cover different energy regions [13–15] and assigned these multicanonical weight factors into four replicas (the weight factors cover the potential energy ranges from −13791.5 to −11900.5 kcal/mol, from −12962.5 to −10796.5 kcal/mol, from −11900.5 to −9524.5 kcal/mol, and from −10796.5 to −8293.5 kcal/mol). We then carried out a MUCAREM simulation with four replicas for 1 ns per replica (MUCAREM1 in Table 1.1), in which replica exchange was tried every 1,000 MD steps. We again used the potential-energy histogram of each replica as the input data to the multiple-histogram analysis and finally obtained the multicanonical weight factor with high precision. As a production run, we carried out a 15-ns multicanonical MD simulation with one replica (REMUCA1 in Table 1.1) and the results of this production run were analyzed in detail.
In Fig. 1.2 we show the probability distributions of potential energy that were obtained from the above three generalized-ensemble simulations, namely, REMD1, MUCAREM1, and REMUCA1. We see in Fig. 1.2a that there are enough overlaps between all pairs of neighboring canonical distributions, suggesting that there were sufficient numbers of replica exchange in REMD1. We see in Fig. 1.2b that there are good overlaps between all pairs of neighboring multicanonical distributions, implying that MUCAREM1 also performed properly. Finally, the multicanonical distribution in Fig. 1.2c is completely flat between around −13,000 kcal/mol and around −8,000 kcal/mol. The results suggest that a free random walk was realized in this energy range.
In Fig. 1.3a we show the time series of potential energy from REMUCA1. We indeed observe a random walk covering as much as 5,000 kcal/mol of energy range. We show in Fig. 1.3b the average potential energy as a function of temperature, which was obtained from the trajectory of REMUCA1 by the reweighting techniques. The average potential energy monotonically increases as the temperature increases.
The accuracy of average quantities calculated depend on the “quality” of the random walk in the potential energy space, and the measure for this quality can be given by the number of tunneling events [7, 15]. One tunneling event is defined by a trajectory that goes from E H to E L and back, where E H and E L are the values near the highest energy and the lowest energy, respectively, which the random walk can reach. If E H is sufficiently high, the trajectory gets completely uncorrelated when it reaches E H. On the other hand, when the trajectory reaches near E L, it tends to get trapped in local-minimum states. We thus consider that the more tunneling events we observe during a fixed number of MC/MD steps, the more efficient the method is as a generalized-ensemble algorithm (or, the average quantities obtained by the reweighting techniques are more reliable). Here, we took E H = −8,250 kcal/mol and E L = −12,850 kcal/mol for the measurement of the tunneling events. The random walk in REMUCA1 yielded as many as 55 tunneling events in 15 ns. The corresponding numbers of tunneling events for REMD1 and for MUCAREM1 were 0 in 3.2 ns and 5 in 4 ns, respectively. Hence, REMUCA is the most efficient and reliable among the three generalized-ensemble algorithms.
In Fig. 1.4 the potential of mean force along the first two principal component axes at 300 K is shown. There exist three distinct minima in the free-energy landscape, which correspond to three local-minimum-energy states. We show representative conformations at these minima in Fig. 1.5. The structure of the global-minimum free-energy state (GM) has a partially distorted α-helix with the salt bridge between Glu−-2 and Arg+-10. The structure is in good agreement with the experimental structure obtained by both NMR and X-ray experiments. In this structure there also exists a contact between Phe-8 and His+-12. This contact is again observed in the corresponding residues of the X-ray structure. At LM1 the structure has a contact between Phe-8 and His+-12, but the salt bridge between Glu−-2 and Arg+-10 is not formed. On the other hand, the structure at LM2 has this salt bridge, but it does not have a contact between Phe-8 and His+-12. Thus, only the structures at GM satisfy all of the interactions that have been observed by the X-ray and other experimental studies.
The next example is the C-terminal β-hairpin of streptococcal protein G B1 domain [39]. This peptide is sometimes referred to as G-peptide [40] and is known by experiments to form β-hairpin structures in aqueous solution [41, 42]. The number of amino acids is 16 and the amino-acid sequence is: Gly-Glu−-Trp-Thr-Tyr-Asp−-Asp−-Ala-Thr-Lys+-Thr-Phe-Thr-Val-Thr-Glu−. The N-terminus and C-terminus were set to be in the zwitter ionic form (NH3 + and COO−), following the conditions in the experiments. GROMOS96 (43a1) force field [43] was used for the solute molecule. SPC model [44] was employed for solvent water molecules according to the GROMOS prescription. We first performed a REMD simulation of G-peptide without explicit solvents from a fully extended polypeptide conformation. In the simulation, we used the distance-dependent dielectric constant. We then selected the final conformation in the replica that was simulated at the highest temperature at the end of the simulation. This conformation was soaked in a water cap whose radius was 26 Å. Before starting the MUCAREM simulation, we performed a 100-ps REMD simulation with 64 replicas twice. (One of them was done for optimization of temperature table for the second REMD.) Using the results of the second REMD, we determined the initial multicanonical weight factor. By iterating cycles of a short MUCAREM with 8 replicas and an update to a new weight factor [15], we refined the multicanonical weight factor. After that we performed a MUCAREM MD with 8 replicas for 34.75 ns (per replica) as a production run. Thus, the total production MD length was 278 ns. In total, three independent folding events were observed in three different replicas. Thus, the average simulation length per one observed folding event was 92.7 ns. This suggests that MUCAREM can accelerate G-peptide folding more than 60 times than the conventional MD simulations, because the experimental folding time of G-peptide is 6 μs [45].
Figure 1.6 shows the time series of the heavy-atom Root Mean Square Deviation (RMSD) from the native configuration (coordinates in the PDB entry 2GB1) and representative snapshot structures observed in the folding events are shown for two replicas. They indeed folded into native-like conformations.
We also evaluated the canonical expectation values of secondary-structure contents (β-bridge contents) of each residue at 320 K using the multiple-histogram reweighting techniques in Eqs. (1.56), (1.57), and (1.58). The results are shown in Fig. 1.7. These results are qualitatively similar to the previous ones that were derived from shorter MUCAREM simulations [36, 37]. They clearly imply that the β-hairpin structures are formed at this temperature.
The third example is the chicken villin headpiece subdomain in explicit water [46]. The number of amino acids is 36. The force field CHARMM22 [47] with CMAP [48, 49] and TIP3P water model [38, 47] were used. The number of water molecules was 3,513. The MD time step was 1.0 fs. We made two production runs of about 1 μs, each of which was a MUCAREM simulation with eight replicas. They are referred to as MUCAREM1 and MUCAREM2. The former consisted of 1.127 μs covering the temperature range between 269 and 699 K, and the latter 1.157 μs covering the temperature range between 289 and 699 K.
We consider that the backbone folded into the native structure from unfolded ones if the mainchain RMSD becomes less than or equal to 3.0 Å. The folding event is counted separately if it goes through an unfolded structure (with the backbone RMSD greater than or equal to 6.5 Å). With this criterion, we observed 11 folding events in seven different replicas (namely, Replicas 5, 7, and 8 in MUCAREM1 and Replicas 1, 2, 4, and 5 in MUCAREM2). In Fig. 1.8 we show the snapshots of the replicas folding into native-like conformations for the two MUCAREM production runs. In Fig. 1.9 we compare the obtained low-RMSD conformations and the native structure. They are indeed very close to the native structure.
1.4 Conclusions
In this article we introduced four powerful generalized-ensemble algorithms, namely, multicanonical algorithm (MUCA), replica-exchange method (REM), replica-exchange multicanonical algorithm (REMUCA), and multicanonical replica-exchange method (MUCAREM), which can greatly enhance conformational sampling of biomolecular systems. The results of protein folding simulations by these methods were presented. Because it is very difficult to determine the multicanonical weight factors for very large systems, MUCAREM is the most promising method among the four methods for large biomolecular systems.
References
Hansmann UHE, Okamoto Y (1999) New Monte Carlo algorithms for protein folding. Curr Opin Struct Biol 9:177–183
Mitsutake A, Sugita Y, Okamoto Y (2001) Generalized-ensemble algorithms for molecular simulations of biopolymers. Biopolymers 60:96–123
Sugita Y, Okamoto Y (2002) Free-energy calculations in protein folding by generalized-ensemble algorithms. In: Schlick T, Gan HH (eds) Lecture notes in computational science and engineering. Springer, Berlin, pp 304–332; e-print: cond-mat/0102296
Okumura H, Itoh SG, Okamoto Y (2012) Generalized-ensemble algorithms for simulations of complex molecular systems. In: Leszczynski J, Shukla MK (eds) Practical aspects of computational chemistry II: an overview of the last two decades and current trends. Springer, Dordrecht, pp 69–101
Sugita Y, Miyashita N, Li P-C, Yoda T, Okamoto Y (2012) Recent applications of replica-exchange molecular dynamics simulations of biomolecules. Curr Phys Chem 2:401–412
Berg BA, Neuhaus T (1991) Multicanonical algorithms for 1st order phase transitions. Phys Lett B267:249–253
Berg BA, Neuhaus T (1992) Multicanonical ensemble: a new approach to simulate first-order phase transitions. Phys Rev Lett 68:9–12
Hansmann UHE, Okamoto Y (1993) Prediction of peptide conformation by multicanonical algorithm – new approach to the multiple-minima problem. J Comput Chem 14:1333–1338
Hukushima K, Nemoto K (1996) Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn 65:1604–1608
Marinari E, Parisi G, Ruiz-Lorenzo JJ (1997) Numerical simulations of spin glass systems. In: Young AP (ed) Spin glasses and random fields. World Scientific, Singapore, pp 59–98
Hansmann UHE (1997) Parallel tempering algorithm for conformational studies of biological molecules. Chem Phys Lett 281:140–150
Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 314:141–151
Sugita Y, Okamoto Y (2000) Replica-exchange multicanonical algorithm and multicanonical replica-exchange method for simulating systems with rough energy landscape. Chem Phys Lett 329:261–270
Mitsutake A, Sugita Y, Okamoto Y (2003) Replica-exchange multicanonical and multicanonical replica-exchange Monte Carlo simulations of peptides. I. Formulation and benchmark test. J Chem Phys 118:6664–6675
Mitsutake A, Sugita Y, Okamoto Y (2003) Replica-exchange multicanonical and multicanonical replica-exchange Monte Carlo simulations of peptides. II. Application to a more complex system. J Chem Phys 118:6676–6688
Sugita Y, Kitao A, Okamoto Y (2000) Multidimensional replica-exchange method for free-energy calculations. J Chem Phys 113:6042–6051
Fukunishi F, Watanabe O, Takada S (2002) On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: application to protein structure prediction. J Chem Phys 116:9058–9067
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Nosé S (1984) A molecular dynamics method for simulations in the canonical ensemble. Mol Phys 52:255–268
Nosé S (1984) A unified formulation of the constant temperature molecular dynamics methods. J Chem Phys 81:511–519
Hansmann UHE, Okamoto Y, Eisenmenger F (1996) Molecular dynamics, Langevin and hybrid Monte Carlo simulations in a multicanonical ensemble. Chem Phys Lett 259:321–330
Nakajima N, Nakamura H, Kidera A (1997) Multicanonical ensemble generated by molecular dynamics simulation for enhanced conformational sampling of peptides. J Phys Chem B 101:817–824
Ferrenberg AM, Swendsen RH (1988) New Monte Carlo technique for studying phase transitions. Phys Rev Lett 61:2635–2638; (1989). ibid., 63, 1658
Berg BA (2004) Markov chain Monte Carlo simulations and their statistical analysis. World Scientific, Singapore, p 253
Berg BA (2003) Multicanonical simulations step by step. Comput Phys Commun 153:397–406
Mori Y, Okamoto Y (2010) Replica-exchange molecular dynamics simulations for various constant temperature algorithms. J Phys Soc Jpn 79:074001
Ferrenberg AM, Swendsen RH (1989) Optimized Monte Carlo data analysis. Phys Rev Lett 63:1195–1198
Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM (1992) The weighted histogram analysis method for free-energy calculations on biomolecules. 1. The method. J Comput Chem 13:1011–1021
Sugita Y, Kitao A (1998) Improved protein free energy calculation by more accurate treatment of nonbonded energy: application to chymotrypsin inhibitor 2, V57A. Proteins 30:388–400
Kitao A, Hayward S, Go N (1998) Energy landscape of a native protein: jumping-among-minima model. Proteins 33:496–517
Morikami K, Nakai T, Kidera A, Saito M, Nakamura H (1992) PRESTO (protein engineering simulator): a vectorized molecular dynamics program for biopolymers. Comput Chem 16:243–248
Sugita Y, Okamoto Y (2005) Molecular mechanism for stabilizing a short helical peptide studied by generalized-ensemble simulations with explicit solvent. Biophys J 88:3180–3190
Shoemaker KR, Kim PS, York EJ, Stewart JM, Baldwin RL (1987) Tests of the helix dipole model for stabilization of alpha-helices. Nature 326:563–567
Shoemaker KR, Faiman R, Schultz DA, Robertson AD, York EJ, Stewart JM, Baldwin RL (1990) Side-chain interactions in the C-peptide helix: Phe 8 … His 12+. Biopolymers 29:1–11
Wang J, Cieplak P, Kollman PA (2000) How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules. J Comput Chem 21:1049–1074
Yoda T, Sugita Y, Okamoto Y (2004) Comparisons of force fields for proteins by generalized-ensemble simulations. Chem Phys Lett 386:460–467
Yoda T, Sugita Y, Okamoto Y (2004) Secondary-structure preferences of force fields for proteins evaluated by generalized-ensemble simulations. Chem Phys 307:269–283
Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparison of simple potential functions for simulating liquid water. J Chem Phys 79:926–935
Yoda T, Sugita Y, Okamoto Y (2007) Cooperative folding mechanism of a β-hairpin peptide studied by a multicanonical replica-exchange molecular dynamics simulation. Proteins 66:846–859
Honda S, Kobayashi N, Munekata E (2000) Thermodynamics of a β-hairpin structure: evidence for cooperative formation of folding nucleus. J Mol Biol 295:846–859
Blanco FJ, Rivas G, Serrano L (1994) A short linear peptide that folds into a native stable β-hairpin in aqueous solution. Nat Struct Biol 1:584–589
Kobayashi N, Honda S, Yoshii H, Uedaira H, Munekata E (1995) Complement assembly of two fragments of the streptococcal protein G B1 domain in aqueous solution. FEBS Lett 366:99–103
van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, Mark AE, Scott WRP, Tironi IG (1996) Biomolecular simulation: the GROMOS96 manual and user guide. Vdf Hochschulverlag AG an der ETH, Zurich
Berendsen HJC, Postma JPM, van Gunsteren WF, Hermans J (1981) Interaction models for water in relation to protein hydration. In: Pullman B (ed) Intermolecular forces. Reidel, Dordrecht, pp 331–342
Munoz V, Thompson PA, Hofrichter J, Eaton WA (1997) Folding dynamics and mechanism of β-hairpin formation. Nature 390:196–199
Yoda T, Sugita Y, Okamoto Y (2010) Hydrophobic core formation and dehydration in protein folding studied by generalized-ensemble simulations. Biophys J 99:1637–1644
MacKerell AD Jr, Bashford D, Bellott M, Dunbrack RL Jr, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-McCarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WEIII, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B 102:3586–3616
MacKerell AD Jr, Feig M, Brooks CL III (2004) Improved treatment of the protein backbone in empirical force fields. J Am Chem Soc 126:698–699
Mackerell AD Jr, Feig M, Brooks CL III (2004) Extending the treatment of backbone energetics in protein force fields: limitations of gas phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J Comput Chem 25:1400–1415
Acknowledgments
This work was supported, in part, by Grants-in-Aid for Scientific Research on Innovative Areas, “Transient Macromolecular Complex” (Y.S.) and “Fluctuations and Biological Functions” (Y.O.), for Computational Materials Science Initiative (Y.O.), and for High Performance Computing Infrastructure (HPCI) (Y.S. and Y.O.) from MEXT, Japan
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Yoda, T., Sugita, Y., Okamoto, Y. (2014). Protein Folding Simulations by Generalized-Ensemble Algorithms. In: Han, Kl., Zhang, X., Yang, Mj. (eds) Protein Conformational Dynamics. Advances in Experimental Medicine and Biology, vol 805. Springer, Cham. https://doi.org/10.1007/978-3-319-02970-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-02970-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02969-6
Online ISBN: 978-3-319-02970-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)