Keywords

1 Introduction

Recent discoveries reveal the new biological functions of non-coding RNAs such as RNA silence and riboswitch. The functions of non-coding RNAs are intrinsic to RNA structures and stability, and can provide potential RNA-based therapeutic strategies. This demands the quantitative understanding and prediction on RNA structure and its stability, i.e., RNA folding problem. RNA structural folding is driven by the intra-molecular forces, such as base pairing/stacking interactions, ion-mediated interactions and the conformational entropies [1, 2]. This chapter is focusing on the three aspects in RNA folding: structure prediction, folding kinetics and ion-mediated electrostatic interactions, respectively.

First, RNA structure prediction is one of central issues in RNA folding problem, since RNA structures involve not only their biological functions such as gene regulation, but also the interactions with other molecules which can provide the potential therapeutic strategies. Generally, RNA folding is hierarchical, i.e., the secondary structure can be formed firstly driven by strong base pairing/stacking interactions, and afterwards, the tertiary structure can be folded by the aggregation of secondary segments and the formation of tertiary contacts [3]. Since the base pairing/stacking interactions are very strong, the secondary structure of RNAs can be relatively stable. Accordingly, the RNA structure prediction can be classified into the secondary structure prediction and tertiary structure prediction. From 1980s, many efforts have been made on the secondary structure prediction based on the experimentally derived parameters and great progress has been made in accurate predictions on RNA secondary structure [48]. However, RNAs are often biological functional in their 3-dimensional structures. The lack of experimentally derived structures and the high cost for experimentally determining structures have enabled the computational modelling for predicting RNA structures [938]. The tertiary (including 3-dimensional) structure predictions have attracted much attention and important progress has been achieved in recent years, which will be focused on in the first part of this chapter.

Second, in addition to static structures, RNA folding kinetics is also directly tied to RNA biological functions. Experiments suggest that alternative conformations of the same RNA sequence perform different functions [3941]. The capability of RNA molecules to form multiple (metastable) conformations for different functions is probably used by nature to regulate versatile functions of RNA. Furthermore, it was found that the folding of the functional structures is controlled by folding kinetics rather than by equilibrium thermodynamics. The mechanisms of ribozyme [42, 43], anti-HIV RNA aptamers [4446], gene expression regulators such as miRNA, siRNA and riboswitches [4753] and other RNAs are often kinetically controlled. For instance, self-induced riboswitches regulate RNA functions by limiting biologically functional properties of RNA structures to certain time windows. The hok/sok system of plasmid R1 [54, 55] regulates the plasmid maintenance through mRNA conformation rearrangements into different functional forms. For riboswitches, it has been proposed that the transient intermediate structure of RNA can regulate transcription and translation by creating a time window that is necessary for regulatory reactions to occur [56]. After the early work of Porschke [57] and Crothers [58, 59], extensive kinetic experiments, such as temperature-jump, single molecule and time resolved NMR spectroscopy experiments, have been employed to study the RNA [6064] and DNA [6567] folding kinetics. The recent progresses in understanding RNA/DNA folding kinetics will become the second part of the chapter.

Third, due to the polyanionic nature of RNAs, RNA folding causes massive build-up of the negative charges [6873] and a strong intra-chain Coulombic repulsion. However, the folding would attract metal ions in solution and cause significant ion binding to RNA, to effectively reduce the electrostatic energy barrier and stabilize a folded RNA structure. Therefore, RNA folding including folded structure, folding kinetics, and stability are strongly coupled to ion electrostatic interactions [1, 6873]. The third part of the chapter will focus on the recent progress in qualitative/quantitative understanding on ion roles in RNA folding.

In the following, we will review the recent progress in RNA folding, in the three aspects including structure prediction, folding kinetics, and ion electrostatics.

2 RNA Structure Prediction

As described above, the RNA structure prediction can be divided into two levels: secondary structure prediction and tertiary structure prediction. On predicting RNA secondary structure, many computational models have been proposed, based on the experimental thermodynamic parameters [48], such as Mfold through free energy minimization method [4], Vienna RNA package with dynamic programming algorithm [5], Sfold through sampling structures with Boltzmann statistics [6], etc.; see Ref. [8] for a recent review on RNA secondary structure prediction. In the following, we will focus on the RNA tertiary structure prediction methods which can be classified into three types: knowledge-based structure modelling, physics-based structure modelling, and knowledge/physics-hybridized structure modelling; see Table 11.1 for a summary on the algorithms for RNA tertiary structure prediction [938]. Other reviews are also available [7479].

Table 11.1 The algorithms for RNA three-dimensional/tertiary structure prediction

2.1 Knowledge-Based Structure Prediction

With the rapid increase in the RNA structure data deposited in protein data bank (PDB) and nucleic acid data bank (NDB), the knowledge-based modelling is becoming an important method for predicting RNA structure based on available sequences. The knowledge-based modelling relies on the database of experimentally solved structures and empirically observed structural similarities between the same (similar) sequences.

2.1.1 Graphics-Based Method

One kind of knowledge-based modelling is Graphics-based method, which usually involves interactive (user-guided) manipulation of RNA structures based on the assembly of fragments derived from various experimental structures (motifs), including the algorithms of MANIP [9], S2S/Assemble [11, 12], and RNA2D3D [13].

MANIP. Massire and Westhof have developed a program MANIP [9], which allows the rapid assembly of isolated motifs (each with a specified sequence) into a complex three-dimensional architecture by users. As an interactive tool, MANIP has a toolbox, where the user can find a variety of tools that help to design a three-dimensional structure model. MANIP constitutes a quick and easy way to model small- to large-size structured RNAs, and the use of multiple connections and pairing tables opens the further development perspectives and allows, for instance, the precise modelling of RNA-protein interactions.

S2S/Assemble. Jossinet et al. [11] developed a program S2S (sequence to structure), in which a user can conveniently display, manipulate and interconnect heterogeneous RNA data. Assemble [12], an algorithm complementary to S2S, is an intuitive graphical interface to analyze, manipulate and build complex 3D RNA architectures. S2S/Assemble is a system that combines various tools and web services into a powerful package to edit sequences and structures of RNA. It contains explicit annotation of base pairing and stacking interactions, multiple sequence alignments, a motif library and an automatic procedure to generate 3D models from the annotation. But all interactions have to be annotation manually, and thus it is difficult to perform a high-throughput analysis.

RNA2D3D. With the use of the primary sequence and secondary structure information of an RNA, the program RNA2D3D [13] automatically and rapidly produces a 3-dimensional conformation (the initial) consistent with the available information. At the next step, the overlaps in the initial 3D structure model are removed and conformational changes are made aiming to the targeted features. Subsequently, the refinement needs to be performed by the user through its interactive graphical editing and the special tools such as the compacting, stem-stacking and segment-positioning energy-refinement. The most important advantage of RNA2D3D is that it is applicable to structures of arbitrary branches and pseudoknots. The algorithm has been verified in the modelling of ribozymes, viral kissing loops, and viral internal ribosome entry sites.

Obviously, all these methods are not an automatic algorithm. The graphics-based modelling requires users to set up and refine the RNA structures according to the specific principles, thus requires users’ expert knowledge [913].

2.1.2 Homology-Based Modelling

Another kind of knowledge-based modelling is the homology-based modelling, i.e., comparative modelling, based on the empirical observation that evolutionarily related macromolecules usually retain similar 3D structure despite the divergence on the sequence level [81]. Several algorithms have been developed based on the homology-based modelling, such as ModeRNA [14] and RNABuilder [15].

ModeRNA. As a minimal input, ModeRNA [14] requires the 3D coordinates of template structures and a pairwise sequence alignment between the sequences of the template and the RNA to be modeled. The ModeRNA provides a flexible scripting framework that can build RNA structures with various strategies, including the fast automated modelling based on template structure and target–template alignment without additional data. The ModeRNA was tested by 99 tRNAs with known structures (experimentally solved and each of them as a target to be modelled on each of the other 98 structures as templates) with RMSD values around 5.0 Å.

RNABuilder. Recently, Flores et al. have developed RNABuilder [15] now known as MMB (a contraction of MacroMolecule Builder), for comparative modelling of RNA structures. It generates RNA structures by treating the kinematics and forces at separate. The coarse-graining force field for an alignment used in this approach consists of forces and torques which act to bring the interacting bases into the base pairing geometry specified by the user. RNABuilder has been used to predict the structure of the 200-nucleotide Azoarcus group I intron in the absence of any information of the solved Azoarcus intron crystal structure. The model accurately depicts the global topology, secondary and tertiary connections, and gives an overall RMSD value of 4.6 Å relative to the crystal structure.

Homology-based modelling can be used to predict any RNA molecules no matter how large or small, as long as the user can find a template and an effective alignment between the template and the target [14, 15, 79]. So this method is also called template-based modelling. Although the PDB/NDB database covers many important families, it may be difficult to find a proper template RNA for a particular target. In addition, creating an accurate and biologically relevant target–template sequence alignment is also a critical issue [79, 80].

2.2 Physics-Based Structure Prediction

Physics-based (ab initio) approach is based on the thermodynamics hypothesis [82]: the conformation with the lowest free energy corresponds to the native structure. Since a full-atomic structural model of RNA has a large number of degrees of freedom, which results in the huge computational complexity. For physical simplification, several prediction models with coarse-graining have been proposed at different resolution levels [16].

2.2.1 One-Bead Coarse-Grained Model

One-bead model uses one bead to represent a nucleotide, thus significantly reduce the spatial freedoms of an RNA structure. Several algorithms have been developed for predicting RNA 3D structures, such as YUP [17] and NAST [18].

YUP. Yammp Under Python (YUP) [17] is a general-purpose molecular mechanics program for multi-scaled coarse-grained modelling, in which Python is used as a programming/scripting language. It can be used to model RNA structures as well as DNA and protein structures by extending the Python language through adding three new data types (atom maps, atom vectors and numerous energy types). In general, YUP is an extendable and useful tool for multi-scale modelling, but its potentials are required to be changed by the user according to the problem at hand. In addition, a fragment-based approach is used to add full-atomic details to the coarse-grained structure in YUP.

NAST. The Nucleic Acid Simulation Tool (NAST) [18] is a molecular dynamics simulation tool for predicting 3D structure for large RNA molecules based on secondary structures. Three types of data are also used to rank the conformational clusters produced form molecular dynamics simulations: (1) ideal small-angle X-ray scattering (SAXS) data; (2) experimental and ideal solvent accessibility (SAS) data; and (3) NAST energy (statistical information). NAST has been tested by building the structural models for two RNA molecules–the yeast tRNAPhe (76-nt) and the P4-P6 domain of the Tetrahymena thermophila group I intron (158-nt), with the averaged RMSD 8.0 ± 0.3 and 16.3 ± 1.0 Å, respectively. Recently, the authors also developed a fully automated frament- and knowledge-based method, called C2A (Coarse to Atomic) [19], to add full atomic details to coarse-grained models.

Both YUP and NAST are successful for large RNA molecules at nucleotide level, but they are limited by their prior need for secondary structure and the information of some tertiary contacts derived from both experimental and computational methods.

2.2.2 Three-Bead Coarse-Grained Model

Beyond the one-bead models [1720], a number of coarse-grained models with higher resolution have been developed, such as three-bead [21, 24], five-bead [27] and six to seven-bead model [28].

Vfold. Cao and Chen have developed a three-vector virtual bond-based RNA folding model (Vfold) [21] for predicting RNA 3D tertiary folds from the sequence without using the experimental constraints. In Vfold, the loop conformations are produced by the self-avoiding random walks of the virtual bonds on a diamond lattice [22, 23] and the conformational entropy of RNA structures can be calculated. The Vfold model has been tested by a systematic benchmark including a wide range of RNA motifs (such as hairpin, duplex), pseudoknot, and a large RNA (a 122-nt 5S rRNA domain) with rmsd of about 3.5, 6.0 and 7.4 Å, respectively. Due to the rigid lattice constraints, Vfold is inadequate to study the folding dynamics of RNA.

iFoldRNA. iFoldRNA [25] is a web-based methodology for RNA 3D structure prediction and analysis of RNA folding thermodynamics. It is based on discrete molecular dynamics (DMD) and a force field (including base-pairing, base-stacking and loop entropy) [24]. The ifoldRNA has been tested by simulating a set of 153 RNA molecules within an average 4 Å deviation from experimental structures. Despite its rapid conformational sampling efficiency, the CPU time for DMD simulations also depends on RNA length. Ding et al. recently reported the development of a qualitatively structure refinement approach using hydroxyl radical probing (HRP) measurements to drive DMD simulations for large RNA molecules (80 ~ 230 nt) with complex topologies [26].

The physical-based approaches not only emphasize the necessity of an accurate understanding of RNA tertiary structure but also illustrate the importance of native state dynamics. Although there are many models have been applied [1628], how to build and choose proper force-fields is still a challenge.

2.3 Knowledge/Physics-Hybridized Structure Prediction

In protein structure prediction, the most successful approach is hybrid (de nove) modelling which combines the features of physics-based folding with the use of previously solved structures [8385]. This hybrid (de nove) modelling strongly relies on the structural information from databases [79], and based on the principle, there are some existed programs for RNA structure prediction [2934].

FARNA/FARFAR. Fragment assembly of RNA (FARNA) [31] is developed to predict RNA 3D structure from a sequence, while fragment assembly of RNA with full-atom refinement (FARFAR) [32] adds a refinement with atomic-level interactions to optimize RNA structures generated by FARNA. Based on knowledge-based energy function, FARNA can assemble three-nucleotide all-atom fragments with Monte Carlo algorithm. In a benchmark test of 20 RNA molecules (≤46 nt), FARNA reproduces better than 90 % of Watson-Crick base pairs. Smaller RNAs in the test are accurately reproduced with a resolution of better than 4.00 Å, but the probability of a FARNA prediction within a backbone rmsd of 4.00 Å decreases sharply as a function of RNA length. Nevertheless, combined with secondary structure and multiplexed hydroxyl radical cleavage analysis (MOHCA), FARNA can predict the structure for an RNA as large as 158 nt with the rmsd of 13 Å [32].

MC-Fold/MC-Sym. MC-Fold/MC-Sym pipeline [34] is another full-atomic RNA 3D structure prediction algorithm, which assembles RNA structures from a library of the nucleotide cyclic motif (NCM) [35]. MC-Fold predicts RNA secondary structure using a free energy minimization function, and MC-Sym builds full-atom 3D models of RNA structures based on the scripts generated by MC-Fold and 3D version of the NCM fragments. The predictive power of the pipeline has been confirmed by building 3D structures of precursor microRNA (pre-miRNA), and proposing a new 3D structure of the human immunodeficiency virus (HIV-1) cis-acting 21 frame-shifting element.

Knowledge/physics-hybridized modelling including FARNA/FARFAR and MC-Fold/MC-Sym pipeline is powerful for modelling small RNA molecules [2932], but larger structures remain a challenge because of the computational requirements for full-atomic modelling. A coarse-grained approach would decrease computational requirements for modelling large structures.

3 RNA Folding Kinetics

3.1 Kinetic Model

Most approaches to kinetic RNA folding are based on the description of folding in terms of a stochastic process. Each model consists of three key ingredients: (1) The state space, i.e., the set of structures or conformations, (2) a move-set, i.e., the elementary transitions that can occur between such conformations, and (3) transition rates for each of these allowed transitions. The folding process can now be described as a continuous time Markov process, governed by a master equation for the state probabilities.

Consider an ensemble of conformational states. The population p i (t) for each state i at time t can be described by the following equation (master equation):

$${dp_{i} /dt = \sum {_{\varOmega } [k_{j \to i} } p_{j} (t) - k_{i \to j} p_{i} (t)]},$$

where \(\varOmega\) is the total number of conformations. \({k_{j \to i} }\) and \({k_{i \to j} }\) are the rate of the respective transitions, and they should satisfy the detailed balance condition: \(p_{i} k_{i \to j} = p_{j} k_{j \to i}\), where p i and p j are the Boltzmann distribution of state i and j, \(p_{i} = \frac{1}{Z}\exp ( - \frac{{\Delta G_{i} }}{{k_{B} T}})\), and Z is the partition function \(Z = \sum\nolimits_{i} {\exp ( - \frac{{\Delta G_{i} }}{{k_{B} T}})}\).

When \(\varOmega\) is not very large, the above rate equation can be written as the matrix form [86]: \(d{\mathbf{p}}/dt = {\mathbf{M}} \cdot {\mathbf{p}}\), where p is the vector for the population distribution, M is the \({\varOmega \times \varOmega }\) rate matrix with the matrix elements defined by \({M_{ij} = k_{j \to i} }\;({i \ne j})\) and \({M_{ii} = - \sum\nolimits_{j \ne i}^{{}} {k_{i \to j} } }\). The equation can be solved with analytical form and the population kinetics is given by the eigenvalue spectrum for long times:

$${p(t) = \sum\limits_{m = 1}^{\varOmega } {C_{m} n_{m} e^{{ - \lambda_{m} t}} } },$$
(11.1)

where \({ - \lambda_{m} }\) and \({n_{m} }\) are the m-th eigenvalue and eigenvector of the rate matrix and \({C_{m} }\) is the coefficient as determined by the initial condition.

Because the passage of a rate-limiting step is intrinsically related to the folding speed, it is possible to probe and to identify the rate-limiting steps through the folding from different unfolded initial conformations. In a master equation approach, slow and fast folding speeds are directly correlated to the large and small contributions of the rate-limiting slow kinetic modes. Because the contributions from the slow modes can be computed from the corresponding eigenvectors, Zhang and Chen proposed a general transition state searching method to identify the rate-limiting steps from the eigenvectors of the slow modes [87].

When \(\varOmega\) is large, if there exist discrete rate-limiting steps for the kinetic process, it would be possible to “renormalize” the conformational space into a number of conformational clusters. The large ensemble of chain conformations can thus be drastically reduced into a much smaller number of conformational clusters [88, 89]. Different clusters are separated by the rate-limiting steps. If the rate-limiting steps involve sufficiently high kinetic barrier, the microstates within each cluster would have sufficient time to equilibrate and form a macrostate (in local equilibrium) before crossing the intercluster barriers to enter other kinetically neighboring clusters. The transitions between different clusters (macrostates) determine the overall folding kinetics of the molecule. Otherwise, it needs to simulate the process with Monte Carlo methods [9093]. Due to the drawbacks such as limited sampling and slow calculation, a few methods have been applied to amend these. For examples, rejectionless Monte-Carlo approach was used to conserve the detailed balance condition [94], the simulated annealing techniques was used in order to accelerate folding [95], optimization techniques such as genetic algorithms rather than Monte-Carlo simulation was also used [96].

3.2 Conformation Space

The base pair is the basic subunit for RNA secondary structure, so a base-pair forming/melting corresponds to the smallest possible steps in conformation space [90]. Considering that RNA secondary structure is stabilized mainly by the base stacking interactions, and a single (unstacked) base pair is not stable and can quickly unfold, Zhang and Chen [86] defined an elementary kinetic step for RNA secondary structural change to be the formation/disruption of a stack or a stacked base pair. While this allows the most detailed description of folding pathways, the conformation space is so large that it leads to extremely long simulation runs or restricted to short sequences. To reduce the conformation space, many approaches therefore define the formation or destruction of an entire helix as the basic step [93, 97100]. Another approach is that several uncorrelated base pairs are changed in a single time step [101]. Folding simulations in this scenario are similar to that of single base pair moves. But the relationship of the helices is different from that of base pairs [102]: the two helices can be compatible, partial compatible, and incompatible. A conformation state should consist of compatible or partial compatible helices [97100]. Recently, thermodynamics-based RNA folding in a kinetic folding context. Coarse-grained landscapes in conjunction with stochastic sampling algorithms have been used to study the RNA folding kinetics [103]. By using the barrier trees and assuming that the basins of individual local minima are in quasi-equlibrium, the folding kinetics under transcription was studied [104]. Another approach for cotranscription folding combined the thermodynamic computations with coarse-grained local kinetics [105]. Flammn et al. developed a flooding algorithm that decomposes the landscape into basins surrounding local minima connected by saddle points [106]. Wolfinger et al. [107] use a partitioning of the landscape into macrostates, where a macrostate is defined as the set of all starting conformations for which a gradient walk ends in the same local minimum. The effective transition rates between any two macrostates were calculated from the barrier tree. Tang et al. [103, 108] adopt the probabilistic roadmaps to build an approximated representation of the RNA folding landscape. In the roadmap graph, the vertex set represents valid sampled conformations of the folding landscape and edges are the possible transition path, and the time evolution of the population of different conformations can be calculated through the probabilistic roadmap.

3.3 Move Set and Kinetic Rate Models

3.3.1 Kinetic Rate Models

The kinetic rate for an elementary kinetic step is usually defined as:

$$k_{i \to j} = k_{0} \exp ( - \frac{{\Delta G^{ + } -\Delta G_{i} }}{{k_{B} T}}),$$

where ΔG+ is the free energy of the transition state, ΔGi is the free energy of the state i, k0 is a constant. The actual models for the base pair kinetic move use:

$$k = k_{0} \exp ( - \frac{{\Delta G}}{{2k_{B} T}}),$$

where ΔG is the free energy difference between the two states. Schmitz and Steger [95] treat the stacking energy as the barrier when opening a base pair, the loop energy change as the barrier when closing a base pair. Zhang and Chen [86] define the transition rate for the formation (k +) and the disruption (k ) of a base stack as the following:

$$k_{ + } = k_{0} \exp ( - \frac{{\Delta G_{ + } }}{{k_{B} T}}),\quad k_{ - } = k_{0} \exp ( - \frac{{\Delta G_{ - } }}{{k_{B} T}}),$$

where the prefactor k 0 is fitted from the experimental data and is equal is \(6.6 \times 10^{12} \,{\text{s}}^{ - 1}\) and \(6.6 \times 10^{13}\, {\text{s}}^{ - 1}\) for an AU and GC stack [109], k B is the Boltzmann constant, T is the temperature, and ΔG± is the free energy barrier for the respective transition. Assuming that the barrier for the formation of a stack is caused by the reduction in entropy, \(\Delta G_{ + } = T\Delta S\). If the stack closes a loop, the formation of the stack is accompanied by concomitant entropic decrease for loop closure, thus, the kinetic barrier for loop closure is \(\Delta G_{ + } = T\Delta S = T(\Delta S_{loop} +\Delta S_{stack} )\) where ΔS loop is the entropy of the loop and ΔS stack is the entropy of the stack. Assuming that the barrier for the disruption of a base pair is caused by the energetic (enthalpic) cost ΔH to break the hydrogen bonding and the base stacking interactions: G  = ΔH stack , where ΔH stack is the enthalpy of the stack. Then the rates for the formation and disruption of a stack are:

$$\begin{aligned} k_{ + } & = k_{0} e^{{ - {{\Delta S_{stack} } \mathord{\left/ {\vphantom {{\Delta S_{stack} } {k_{B} T}}} \right. \kern-0pt} {k_{B} T}}}}; \\ k_{ - } & = k_{0} e^{{ - {{\Delta H} \mathord{\left/ {\vphantom {{\Delta H} {k_{B} T}}} \right. \kern-0pt} {k_{B} T}}}} \\ \end{aligned}$$

respectively, and the rates for formation and disruption of a loop-closing stack (and the loop) are:

$$k_{ + }^{loop} = k_{0} \exp ( - (\Delta S_{loop} +\Delta S_{stack} )/k_{B} T),\quad k_{ - }^{loop} = k_{0} \exp ( -\Delta H/k_{B} T).$$

3.3.2 Model to Calculate the Transition Rate for Helix Based Moves

Tacker et al. [110] propose a rate model similar to that described for single base pairs: the activation energy is the change in loop energies when forming a helix, while opening a helix it is the stacking free energies. The same approach was adopted by in Refs. [111, 112]. Isambert [113] proposed that the free energy barrier for helix formation is the entropic penalty incurred by inserting the nucleus and the rate of is then given by an Arrhenius law using for nucleation. Zhao et al. [102] calculate the rate of a helix move set from that of the stack move set. If two conformations differ only in one helix, the transition between them would be the formation and disruption of the helix. Assuming that after the first stack is closed (with the concurrent formation of a loop), the helix will form along the zipping pathway. The rate of helix formation can be estimated along this zipping pathway. From the empirical thermodynamic parameters [114, 115], it can be found that for most RNA helices, the free energy landscape for a zipping pathway shows a downhill profile after the formation of the third base stack. Therefore, the rate k f of the helix formation (along a specific pathway) is equal to the rate for the formation of the three-stack state. Considering the (slow) breaking of the stacks, for zipping along the 1 → 2→3 pathway in Fig. 11.1:

$${k_{f} = k_{12} K_{1} (1 - K_{2}^{\prime } K_{1}^{\prime } \sum\limits_{0}^{\infty } {K_{2}^{\prime } K_{1} } ) = k_{12} K_{1} (1 - K_{2}^{\prime } K_{1}^{\prime } \frac{1}{{1 - K_{2}^{\prime } K_{1} }})},$$
(11.2)

where \({k_{ij} }\) denotes the rate for the transition from state i to state j, K 1 and K 1 are the forward (state 2 → 3) and reverse (state 2 → 1) probability of state 2, K 2 and K 2 are the forward (state 3 → 5 and 3 → 6) and reverse (state 3 → 2) probability of state 3,

$${K_{1} = \frac{{k_{23} }}{{k_{23} + k_{21} + k_{24} }},K_{1}^{\prime } = \frac{{k_{21} }}{{k_{23} + k_{21} + k_{24} }},K_{2} = \frac{{k_{35} + k_{36} }}{{k_{32} + k_{35} + k_{36} }},K_{2}^{\prime } = \frac{{k_{32} }}{{k_{35} + k_{36} + k_{32} }}.}$$
(11.3)
Fig. 11.1
figure 1

Multiple pathways for the formation of a helix after the first (nucleation) stack formed

For a given RNA molecule, the first base stack can be formed anywhere inside the helix. Therefore, the net rate k F for the formation of a helix is the sum of the rates (Eq. 11.2) along the two pathways (Fig. 11.1) with the different first (nucleation) base stacks. The rate for the disruption of the helix can be estimated from the equilibrium constant of the helix: \({k_{U} = k_{F} e^{{\varDelta G/k_{B} T}} }\), where ΔG is the folding free energy of the helix.

If two helices A and B overlap with each other, they cannot coexist in the same structure. The conversion of helix A to helix B through complete unfolding of helix A followed by refolding to B is extremely slow due to the high energy barrier to disrupt all the base stacks in helix A. Zhao et al. [102] proposed that there is a much faster tunneling pathway, which is classified as three process: (1) at first helix A partially disrupted, (2) exchanging, disruption of a base stack in A is accompanied by a concurrent formation of a base stack in B, (3) zipping, helix B grows through a zipping process. The pathway is fast because the formation of the base stacks in B tends to cause an overall downhill shape of the free energy landscape. This is similar to the Morgan-Higgs saddle point approach [116], in which the saddle point height is estimated as the highest point along the path. However, the free energy landscape suggests that there exist multiple high free energy points along the path. This (tunneling) pathway involves a much lower energy barrier to unwinding the helix than the complete unfolding pathway (Fig. 11.2). Based on the tunneling pathway, the rate for helix exchange is estimated as:

$$k_{A \to B} = \frac{{\prod {_{i}^{n} k_{i} } }}{{\sum {_{j = 0}^{n - 1} \left( {\prod {_{i = 1}^{j} k_{i}^{\prime } \prod {_{m = j + 2}^{n} k_{m} } } } \right)} }},\quad k_{B \to A} = k_{A \to B} e^{{ - \frac{{\Delta G_{AB} }}{{k_{B} T}}}} .$$
(11.4)
Fig. 11.2
figure 2

The free energy landscape of the tunneling pathway that connects two overlapping helices A and B. U is the open state. The unfolding of A is accompanied by the folding of B. \({k_{1} }\) denotes the transition rate for the unfolding of helix A to form the first stack of helix B. \({k_{1}^{\prime } ,k_{2} ,k_{2}^{\prime } , \ldots ,k_{n} }\) denote the transition rates between the neighboring intermediates along the tunneling pathways

In the above formula, \({k_{n} }\) and \({k_{n}^{\prime } }\) are the rate constants for the process to formation (disruption) and disruption (formation) of a base stack in A (B), respectively.

When the conformation space is consisted of local minima, the transition rate is often calculated by searching the saddle point or from the free energy barrier tree [103105].

3.3.3 Folding Kinetics During Transcription

The folding of functional RNA structures are often coupled with the transcription process [117119]: since transcription is slow compared to local folding processes, the partially synthesized RNA will start folding while the molecule is still being synthesized. For instance, in the auto-catalyzed splicing reaction of tetrahymena group| intron, the functional native structure may form within the timescale of transcription, which is much faster than the refolding of the complete chain in vitro [120125]. Investigations about the RNA component of Bacillus subtilis RNase P folding indicate site-specific pausing could greatly influence the folding result of RNA molecule [126]. Addition of NusA which causes pausing in the process of transcription provides longer duration of temporary RNA chain to undergo the conformational search. Recently, several RNA folding kinetics algorithms were developed in connection with the thermodynamic energetics of the folding system. For instance, by using the barrier trees and assuming that the basins of individual local minima are in quasi-equilibrium, the folding kinetics under transcription was studied [104]. Combining the thermodynamic properties with coarse-grained local folding kinetics, a heuristic approach was also developed to successfully predict cotranscriptional folding for large RNAs [105]. Zhao et al. [127] treat the transcription of a single nucleotide as an elementary time step. The real time for each step is a constant or variable if the nucleotides are synthesized at a constant or variable speed, respectively. The transcriptional pausing at a specific site can be simulated by assigning a large number of effective time steps for the corresponding (paused) step. If the transcription speed of an RNA sequence is ν nucleotides per second, the (real) time window for each step would be 1/v seconds, i.e., the polymerase spends 1/v seconds to synthesis a nucleotide. Assuming that at time t the l-nt chain is (newly) transcribed, the population distribution of the l-nt chain conformational space is relaxed from \([P_{1} (l)_{begin} ,P_{2} (l)_{begin} , \ldots ,P_{\varOmega } (l)_{begin} ]\) to \([P_{1} (l)_{end} ,P_{2} (l)_{end} , \ldots ,P_{\varOmega } (l)_{end} ]\) at time t to time \({t + 1/\nu }\), when the (l + 1)-th nucleotide is transcribed, here Ω is the number of conformations for an l-nt chain (Fig. 11.3). The beginning population of the (l + 1)-th step is inherited from the ending population of the l-th step. However, the RNA chain in the (l + 1)-th step is one nucleotide longer than in the l-th step. According to the possible changes of the structures upon the elongation of the chain by one nucleotide, the structures are classified as four types. The population distribution at the beginning of step l + 1 can be derived from that of the step at the end: \(P(l + 1)_{begin} = P(l)_{end}\) for a, b, & c; \(P(l + 1)_{begin} = 0\) for d. Applying this method from the first step to the end of transcription, we compute the folding kinetics for the RNA chain during transcription.

Fig. 11.3
figure 3

The relationships between l-nt and (l + 1)-nt structures: elongation of an open chain (a), a dangling tail (b), a helix (c), and the formation of a new structure (d). The filled triangle denotes the last transcribed nucleotide in step l, and the square denotes the last transcribed nucleotide in step l + 1

4 Metal Ions in RNA Folding

4.1 Ions Stabilize RNA/DNA Folded Structure

4.1.1 Ion Binding to RNAs/DNAs

Metal ions would like to bind to negatively charged nucleic acids to neutralize the negatively charged RNAs/DNAs [128139]. The number of binding ions is important to DNA/RNA structure and stability, and can be measured via several experimental methods such as the small angle X-ray scattering (SAXS) [128139], the ion-counting method [132], and the thermodynamic method [133138]. The experimental methods have been applied to various RNAs/DNAs, including yeast 58-nt ribosomal RNA fragment [136], tRNA [137, 138], poly(A.U) [133], beet western yellow virus pseudoknot fragment [135], polymeric calf thymus DNA [134], oligomeric DNA/RNA duplexes [132], and DNA triplex [132]; see Ref [72] for a collection on the experimental data for ion binding to RNAs/DNAs. The extensive experimental data have yielded the following major conclusions:

  1. (1)

    The detailed distributions of binding ions near molecular surface are sensitive to the specific atomistic structure of the RNAs/DNAs [131], and the ion-binding’s of different (monovalent and divalent) species of ions appear anti-cooperative [132];

  2. (2)

    Metal ions can give more efficient charge neutralization for RNA than for DNA. Such difference possibly comes from the higher charge density on backbone of A-form helix has than B-form helix [131].

  3. (3)

    Multivalent ions (e.g., Mg2+) are much more efficient in charge neutralization than monovalent ions (e.g., Na+). Such unusually higher efficiency of multivalent ions is beyond the mean-field description such as ionic strength, and is more pronounced for larger RNAs with more compact structures.

In addition to the diffusive ion binding, the specific ion binding may make significant contribution to stabilizing specific RNA folded structure and the function of RNA. The specific ion binding may be related to the RNA sequence, the local geometry, and the property of ion and water, and is a challenge in both experiments and modelling.

4.1.2 Ion Contribution to Flexibility of Single-Stranded RNA

Single-strand (ss) RNA is a fundamental segment in RNA structure and the flexibility of ssRNA is important to the global stability of RNA. The ion contribution to the flexibility (e.g., persistence length l p) of single-strand RNAs have been quantified by a variety of experimental approaches, such as force-extension, single molecule fluorescence resonance energy transfer (smFRET), small angle X-ray scattering (SAXS), and fluorescence recovery after photobleaching, over different kinds of ssRNAs/DNAs [140147]. The major conclusions are in the following:

  1. (1)

    Mg2+ is approximately 60-120 times more efficient than Na+ in neutralizing ss RNAs/DNAs, which is beyond the mean-field concept (e.g., ionic strength) [145];

  2. (2)

    The persistence length of ss RNA/DNA decreases with the increase of [Na+] or [Mg2+], and the ion-concentration dependence is stronger for [Na+] than for [Mg2+] [145, 146];

  3. (3)

    The dependence of persistence length of ss RNA/DNA is stronger for longer sequence. For long ss generic sequence, there is a crude empirical formula for ion-dependent persistence length: \(l_{p} = 5 + 1.5/\sqrt I\), where I is the ionic strength [147].

  4. (4)

    Poly(A)/poly(U) are more stiff than poly(T) at high salt while the ion-dependence of l p are similar. Such stronger intrinsic stiffness may result from the stronger self-stacking of Poly(A)/poly(U) [144, 145].

However, there are also questions remained: (1) How do the flexibility of a ss RNA/DNA and its Na+/Mg2+ dependence rely on the surrounding space? (2) How is the Na+/Mg2+-dependent l p on the sequence length quantified? To answer the questions remains a challenge due to the high conformational fluctuation of ss chain and possible stronger correlations between Mg2+.

4.1.3 Ions Stabilize Helices and Hairpins

Helix is the most fundamental segment of RNA structure (ranging from several to about ten base pairs), and hairpin is the simplest secondary structural motif. The thermodynamic experiments have revealed that the stability of helices and hairpins is sensitive to ionic environments. Most of the experiments were performed in a Na+ solution or a mixed Na+/Mg2+ solution; see Ref. [72] for a brief summary on the experimental data [148159]. These thermodynamic data lead to the following major features for ion effects in helix and hairpin stability:

  1. (1)

    In Na+ or K+ solution, the stabilities of DNA/RNA helices/hairpins depend on ion concentration with the approximately linear dependence on the logarithm of Na+ or K+ concentration and such dependence is strong at low salt (<0.1 M Na+ or K+), and relatively weak at a high Na+ or K+ concentration (≥0.1 M Na+);

  2. (2)

    Compared with Na+ or K+, divalent ions (e.g., Mg2+) are more efficient in stabilizing helices/hairpins. For an example, the stability for short DNA/RNA duplexes/hairpins in a 10 mM Mg2+ solution is approximately equivalent to the stability in a 1 M Na+ solution [139, 147149].

The thermodynamic parameters for the formation of helix and loop have been measured extensively at 1 M Na+ (i.e., the standard ion condition). These parameters have enabled the accurate predictions on RNA (DNA) secondary structure, stability and kinetics [160164]. For ion condition other than 1 M NaCl, RNA/DNA thermodynamic data and theoretical modelling for various ionic conditions yields a set of fitted formulas for the thermodynamic parameters of RNA/DNA helices versus Na+/Mg2+ concentrations. In contrast to Na+ solutions, experimental data on Mg2+-dependent helix/hairpin stability has been relatively limited, and the [Mg2+]-dependent thermodynamic parameters [154, 155] may need to be validated through more extensive experimental data; see the Sect. 11.4.3. For ion-dependent loop formation thermodynamics, the hairpin loop stability has been derived as functions of [Na+] and [Mg2+], based on the statistical mechanical modelling [156]; see the Sect. 11.4.3.

4.1.4 Ions Stabilize Tertiary Structures

Generally, RNA tertiary structure is folded by the aggregation of secondary segments, the minor rearrangements of secondary segments and the formation of tertiary contact. Since RNA tertiary folding generally involves massive charge build-up, ion-RNA interaction is stronger for tertiary structures and consequently the quantitative understanding on ion effects in RNA tertiary folding becomes more challenging. Extensive experiments have investigated how metal ions assist RNA tertiary folding and stabilize tertiary structures for various RNAs, such as tRNA, 58-nt ribosomal RNA fragment, beet western yellow virus pseudoknot fragment, Tetrahymena ribozyme, kissing complex etc.; see Ref. [72] for a summary for the experimental data of ion effects in RNA tertiary folding [165179]. These experiments have revealed the following important major features on the effects of metal ions, especially Mg2+:

  1. (1)

    Metal ions of higher charge density (i.e. higher valence and smaller size) are more efficient in stabilizing RNA tertiary folds [166, 169]. For the Tar–tar RNA complex, smaller ions can enhance the folding stability [170].

  2. (2)

    Mg2+ can make a significant contribution to RNA tertiary structure stability even at high Na+ (or other monovalent ions) concentration, and Mg2+ can induce more compact folded structures than Na+.

  3. (3)

    For HIV-1 dimerization initiation signal (DIS) type kissing loop-loop complexes, the melting temperature shows much stronger ion-dependence than for the corresponding duplex of the same sequence at the kissing interface [174, 175].

  4. (4)

    The higher efficiency of Mg2+ over Na+ is much more pronounced for the kissing loop complex than for the duplex [174]. Such phenomena may result from the significantly higher massive built-up when loop-loop kissing.

In addition to the non-specific effects of metal ions shown above, some experiments also suggest that, depending on the sequence and geometry, specific interactions of binding ions with the RNA could make critical contribution to RNA tertiary structure [72, 135, 136]. The unclear understanding on roles of specific binding ions suggest the demand for the further more careful and extensive investigations, especially theoretical investigations, on the role of specific binding ions.

4.1.5 Ion-Mediated Structural Collapse

RNA structural collapse during tertiary folding often involves the helix-helix packing, and therefore, the helix-helix interaction is important for RNA tertiary folding. Rau and Parsegian have performed osmotic pressure measurements to quantify the ion-mediated interactions between long DNA helices [159, 160], leading to the following general conclusions:

  1. (1)

    Multivalent ions, such as Co3+, can induce effective attraction between DNA helices, while monovalent ions (e.g., Na+) can only screen the helix–helix electrostatic repulsion [180];

  2. (2)

    Certain types of divalent ions such Mn2+ can induce effective helix-helix attractive force [181], while other divalent ions such as Ca2+ cannot. Mg2+ in the presence of methanol could induce the effective helix-helix attraction [181]. The different roles of divalent ions might be attributed to the different ion binding affinities to the different groups [1]. For example, Mn2+ likes to binding into grooves, while Ca2+ likes to binding to phosphate groups [1].

However, in realistic RNA structures, helices are generally very short (ranging from several to around ten base pairs). Ions may have different effects in the effective interactions between short helices from long helices due to the greater rotational freedom and stronger end-effects of short helices. The recent experiments for short helices [68, 182185] indicate the following conclusions:

  1. (1)

    For a system of dispersed short DNA helices, the SAXS experiments suggest that Mg2+ of high concentration can induce effective helix-helix attraction through end-end stacking [183].

  2. (2)

    For a system of loop-tethered short helices, the SAXS experiments suggested a possible weak side-side helix-helix attraction in a Mg2+ solution of high concentration (~0.6 M) [68].

  3. (3)

    The experiments showed that the PB theory underestimates the efficiency of Mg2+ in RNA structural collapse by over 10 times [184].

  4. (4)

    In trivalent ion solution, short DNA duplexes can become condensed while RNA duplexes keep soluble [185].

  5. (5)

    Mg2+ cannot condense long DNA duplexes while could condense short DNA triplex in aqueous solution [181, 184].

However, for the system of short helices, further investigations are still required to make clear: (1) How do the different ions (with different valences and sizes) cause the different effective helix-helix interactions? (2) Is the relaxation state a randomly disordered state or a state with certain order or ion-specific?

4.2 Theoretical Modelling for Ion Electrostatics

To quantitatively explore the ion effects in RNA folding, some theoretical approaches have been developed. The application of these theories on the ion-RNA (DNA) system has significantly enhanced the quantitative understanding on the ion role in RNA folding, which will be introduced in the following; see Ref. [186] for a review on the theoretical models.

4.2.1 Counterion Condensation Theory

The counterion condensation (CC) theory was developed for describing the interaction between ion and long DNA [187, 188]. In the theory, a DNA helix is approximated as a line-charge, and metal ions around DNA are either in the condensed state (near the DNA surface) or in the free state (away from the vicinity of DNA). The binding of an ion would decrease the electrostatic energy and simultaneously increase the ion entropic free energy. The competition between the two components (electrostatic energy and ion entropy) determines the thermodynamically equilibrium state.

The application of the CC theory on the effect of monovalent ions (e.g., Na+, K+) in DNA helix thermodynamics [160, 187] shows a linear dependence of melting temperature T m of DNA helix on the logarithm of monovalent ion concentration, which is in accordance with the experimental measurements. For the system of multi-body helices, the CC theory predicts that two DNA helices attract each other in both monovalent and multivalent salts. For lower salt concentration, the predicted attractive force becomes stronger while two helices are equilibrated at a larger separation [189]. The predictions are somewhat inconsistent with the experiment measurements [180, 181, 183] and computer simulations [190, 191] on nucleic acid helix-helix interactions.

Although the CC theory has gained the great success in the analysis of DNA thermodynamics, the theory still has the serious shortcomings: (1) The CC theory cannot be strictly employed to the RNA with complex structure in finite salt solutions since the theory is derived based on the assumptions of infinite-length DNA line-charge structural model and a infinite-dilute ion concentration; (2) The CC theory ignores the fluctuation and correlation of condensed ions, by assuming a uniform distribution of condensed ions along DNA. Consequently, the theory may become invalid for the multivalent ion solution where the correlations can be strong.

4.2.2 Poisson-Boltzmann Theory

The Poisson-Boltzmann (PB) theory has its early and simplified versions known as Gouy-Chapmann theory and Debye-Huchel theory. These two theories are the simplified versions of the PB theory for different specific systems [186]. The PB equation can be derived based on the Poisson equation for mean electric potential ψ and a Boltzmann distribution for diffusive ions in solutions

$$\nabla \cdot [\varepsilon ({\mathbf{r}})\varepsilon_{0} {\nabla }\psi ({\mathbf{r}})] = - 4\pi \left[ {\rho_{f} + \sum\limits_{\alpha } {ec_{\alpha }^{0} } N_{AV} z_{\alpha } {\text{e}}^{{ - z_{\alpha } e\psi ({\mathbf{r}})/k_{B} T}} } \right],$$
(11.5)

where z α is approximated to be the electrostatic energy of a diffusive ion with ionic charge z α e. ε is the dielectric constant; ρ f is the charge density of fixed charges in biomolecules; and \(c_{\alpha }^{0}\) is the bulk concentration of ion species α. In the recent two decades, some algorithms have been developed to numerically solve the PB equation, and the PB theory has been widely used in the electrostatics of biomolecules in solutions [192196]. For electrostatics of biomolecules in aqeous/monovalent ion solutions, the experimental comparisons show that the PB theory makes rather accurate predictions [e.g., 197].

However, a mean-field approximation is employed in deriving the PB equation, i.e., diffusive charges (ions) obey a mean Boltzmann distribution based on the mean electric potential in stead of the potential of mean force. As the result, (1) the PB theory ignores the fluctuation of ions in solution by assuming a mean ion distribution; (2) the PB theory ignores the ion-ion correlations by assuming the mean electric potential for diffusive ions rather than the potential of mean force, and (3) the PB theory ignores the ion finite size by the point-charge approximation. Therefore, the PB could not make reliable predictions on the electrostatic interactions for nucleic acid in multivalent ion solution where ion-ion interactions can be strong [186, 198]. An important example is the (multivalent) ion-mediated like-charge interaction, the PB always predicts repulsive force between two like-charged polyelectrolytes (DNA helices), while the experiments have shown the attractive force in multivalent salts [180, 181]. For ion-mediated RNA structural collapse, the PB theory underestimates the efficiency of Mg2+ by over 10 times [179, 182].

4.2.3 Modified Models Beyond Mean-Field Approximation

Aiming to improve the prediction for polyelectrolyte-multivalent ions, many efforts have been made in the recent years to overcome the shortcomings of the PB theory. Here, we will introduce several major modified models beyond the mean-field approximation [198207].

Size-modified Poisson-Boltzmann model. The simplest modification for the PB model is to incorporating discrete ion size into the model. Recently, a size-modified PB model was proposed based on the lattice gas formulism, where the ion solution is discretized into a lattice with grid cells which can be occupied by ions with finite size [199, 200]. The application of the model for 3-dimensional complex nucleic acids shows that the modification can improve the prediction on monovalent ion-binding profiles at high salt concentration by capturing the binding saturation effect at high ion concentration. But for RNA solution with multivalent ions, the model still cannot give reliable predictions since it ignores the ion-ion electrostatic correlations [199, 200].

Modified Poisson-Boltzmann theory based on Kirkwood/BBGY hierarchy. A modified PB model has been developed based on Kirkwood/BBGY hierarchy through taking into account the fluctuation potential and ion-exclusion term in the potential of mean force for diffusive ions [e.g., 201]. The comparisons with the computer simulations show that, the theory gives the improved predictions for multivalent ion distributions near polyelectrolytes with ideal 3D shapes such as cylinder, sphere and plane. But for realistic nucleic acids (or proteins) with arbitary 3D shape, the numerical solution requires huge computational cost and is computationally inapplicable for realistic nucleic acids/proteins with complex 3D shape because the equation for the fluctuation potential is coupled to the equation for the mean electrostatic potential [201, 202].

Correlation-corrected Poisson-Boltzmann model. Recently, a so-called correlation-corrected Poisson-Boltzmann model was developed to account for the ion-ion correlations, by introducing an effective potential between like-charge ions [203]. Such effective potential is the same as the Coulomb potential at large ion-ion separation, while becomes a reduced repulsive Coulomb potential for a close ion-ion separation. For the electric double layers, the comparisons with the computer simulations showed that the model makes improved predictions on multivalent ion distributions and predicts an attractive force between the two planes in the presence of multivalent ions [203]. However, the model is computationally expensive for RNAs with complex structures. Moreover, such effective potential is somewhat ad hoc and the mode is still lack of the validation on thermodynamics of nucleic acids.

Other theories beyond the mean-field approximation. Other theories beyond the mean-field approximation such as the integration theory [204], the density function theory [205] and the local molecular field theory [206] have been developed to account for the ion-ion correlation effects around nucleic acids/polyelectrolytes. But due to the huge computation cost for realistic nucleic acid system, these theories are also practically inapplicable. Recently, a tightly bound ion (TBI) theory is developed by explicitly treating the strongly correlated ions which reside in the vicinity of nucleic acid surface [186, 198, 207211]. The extensive experimental comparisons showed that this theory has been shown to make improved predictions on the ion effects for various nucleic acid structures in the presence of Mg2+. In the following, we will focus on the TBI theory and its applications on modelling ion effects in DNA/RNA structure stabilities, including helices, hairpins, tertiary folds, and assembly.

4.3 Tightly Bound Ion Theory

As described above, extensive experiments have shown that Mg2+ plays a special role in RNA folding: Mg2+ is much more efficient than Na+ in RNA folding and Mg2+ can induce more compact structure than Na+. For example, Mg2+ is generally ~1,000 times more efficient than Na+ in RNA tertiary folding [169]. Aiming to quantitatively understand the role of multivalent ions in RNA folding, Tan and Chen have developed a TBI theory, by accounting for ion-correlations and fluctuations for realistic RNAs in ion solutions [186, 198, 207211]. In the following, we will introduce the TBI theory with theoretical framework and applications on modelling ion effects in RNA/DNA structure stabilities.

4.3.1 Framework of the Tightly Bound Ion Theory

Since RNAs/DNAs are highly charged polyanionic molecules, the positively charged metal ions in solutions would aggregate on nucleic acid surface, causing high ion concentration in the vicinity of RNA/DNA surface. These condensed ions of high concentration would interact (correlate) strongly with each other. The correlation strength between ions can be characterized by the coupling parameter Γ [186, 198]

$$\Gamma ({\mathbf{r}}) = \frac{{(z_{\alpha } e)^{2} }}{{\varepsilon a_{wz} ({\mathbf{r}})k_{B} T}} \ge\Gamma _{\text{c}} .$$
(11.6)

Previous studies have shown that for ionic system, the change of coupling parameter Γ can induce the gas-liquid transition, and the critical value Γc was shown to reside in the range of [2.3, 2.9] [186, 198]. In the TBI theory, according to the critical inter-ion correlation strength Γc (2.6, a mean value over [2.3–2.9]), the ions around RNAs/DNAs are divided into two types: (strongly correlated) tightly bound ions in the vicinity of RNA and (weakly correlated) diffusive ions in the outer space. Correspondingly, the space around RNAs is also divided into the tightly bound region and diffusive region. Due to the weak inter-ion correlations, the diffusive ions can be treated by the mean-field PB approach. While for the (strongly correlated) tightly bound ion, the tightly bound region is discretized into different tightly bound cells, each of them around a (negatively) phosphate group. Every tightly bound cell can keep empty or be occupied by an ion, and all possible states of tightly bound cells (either empty or occupied by an ion) give the ensemble of ion-binding modes. The different ion-binding modes (M) in different cells are explicitly considered to account for the effects of ion correlations and fluctuations.

For a nucleic acid-ion system, the partition function Z is given by the summation of the partition function Z M for all possible modes M

$$Z = \sum\limits_{M} {Z_{M} } ,$$
(11.7)

where ZM is given by [177, 186, 198, 208]

$$Z_{M} = Z^{id} (c_{z} )^{{N_{b} }} \left( {\int {\prod\limits_{i = 1}^{{N_{b} }} {d{\mathbf{R}}_{i} } } } \right)e^{{{{ - \left( {\Delta G_{b} +\Delta G_{d} +\Delta G_{b}^{pol} } \right)} \mathord{\left/ {\vphantom {{ - \left( {\Delta G_{b} +\Delta G_{d} +\Delta G_{b}^{pol} } \right)} {k_{B} T}}} \right. \kern-0pt} {k_{B} T}}}} .$$
(11.8)

Here, Z id is the partition function for the uniform ion solution (without RNAs). N b is the number of the tightly bound ions for model M. c z is the bulk concentration of the z-valent ions, and \({\mathbf{R}}_{i}\) denotes the position of tightly bound ion i. \(\int {\mathop \prod \limits_{i = 1}^{{N_{b} }} } d{\mathbf{R}}_{i}\) is the volume integral over the tightly bound region for the N b tightly bound ions. \(\Delta G_{b}\) is the free energy for the discrete charges in the tightly bound region (including the tightly bound ions and phosphate charges); \(\Delta G_{d}\) is the free energy for the diffusive ions, including the electrostatic interactions between the diffusive ions, and between the diffusive ions and the charges in the tightly bound region as well as the entropic free energy of the diffusive ions; \(\Delta G_{b}^{pol}\) is the (Born) self-polarization energy for the discrete charges within the tightly bound region [177, 186, 198, 208].

The TBI theory has been widely applied to quantitatively understanding the ion contributions to RNA secondary and tertiary structure stability, which will be described in the following.

4.3.2 Modelling Ion Effects Stability of DNA/RNA Helices

The stability of helices is essential to the global stability and the functions of RNAs (DNAs). Due to the polyanionic nature, metal ions can be important to the stability of DNA/RNA helices. Based on a polyelectrolyte theory, the melting of a helix can be modelled as a two-state model. The free energy change due to the melting can be decoupled into a non-electrostatic contribution ΔG NE and an electrostatic contribution ΔG E, and ΔG NE can be evaluated by combining the experimental data at a reference state with a polyelectrolyte theory (e.g. the TBI theory) [154, 155]

$$\begin{aligned}\Delta G & =\Delta G^{E} +\Delta G^{NE} \\ & =\Delta G^{E} + \left( {\Delta G_{{ 1 {\text{M Na}}^{ + } }} -\Delta G_{{ 1 {\text{M Na}}^{ + } }}^{E} } \right). \\ \end{aligned}$$
(11.9)

With the use of the TBI theory for treating ion-DNA(RNA) interactions, the Na+/Mg2+ dependence of helix stability can be quantitatively evaluated.

The comparisons with the extensive experimental data showed that the TBI theory makes reliable predictions on the stability of DNA and RNA helices in Na+/Mg2+ solutions [154, 155]. Furthermore, the comprehensive calculations with the TBI theory give a series of empirical formulas for describing the thermodynamics of DNA (RNA) helices in Na+/Mg2+ solutions.

Thermodynamic parameters for DNA helix in Na + /Mg 2+ solution.

For a DNA helix in Na+ solution, the following formulas of Na+-dependent thermodynamics can be obtained from the TBI theory [154]

$$\begin{aligned}\Delta G[{\text{Na}}^{ + } ] & =\Delta G[1\;{\text{M}}\;{\text{Na}}^{ + } ] + (N - 1)\Delta g_{1}^{DNA} ; \\\Delta S[{\text{Na}}^{ + } ] & =\Delta S[1\;{\text{M}}\;{\text{Na}}^{ + } ] - 3.22(N - 1)\Delta g_{1}^{DNA} ; \\ 1/T_{m} [{\text{Na}}^{ + } ] & = 1/T_{m} [1\;{\text{M}}\;{\text{Na}}^{ + } ] - 0.00322(N - 1)\Delta g_{1}^{DNA} /\Delta H[1\;{\text{M}}\;{\text{Na}}^{ + } ], \\ \end{aligned}$$
(11.10)

where ΔG, ΔS, T m , ΔH are the free energy change, entropy change, melting temperature, enthalpy change for helix formation at [Na+] (in molar), or 1 M [Na+] (standard ion condition). \(\Delta g_{1}^{DNA}\) is a function associated with electrostatic folding free energy per base stack of DNA, and is a function of helix length and [Na+] [154]

$$\begin{aligned}\Delta g_{1}^{DNA} & = a_{1}^{DNA} + b_{1}^{DNA} /N; \\ a_{1}^{DNA} & = - 0.07\ln [{\text{Na}}^{ + } ] + 0.012\ln^{2} [{\text{Na}}^{ + } ]; \\ b_{1}^{DNA} & = 0.013\ln^{2} [{\text{Na}}^{ + } ]. \\ \end{aligned}$$
(11.11)

The thermodynamics for DNA helix at any given [Na+] can be calculated through the above empirical formulas, since those at 1 M [Na+] (standard ion condition) can be obtained from the nearest neighbor model with the experimental parameters of SantaLucia et al. [160]. The quantitative comparisons with extensive experimental data show that the empirical formulas can give rather accurate estimates for thermodynamics of DNA helices in Na+ solutions [154].

For the thermodynamics of a DNA helix in Mg2+ solution, the TBI model gives the following similar empirical formulas [154]

$$\begin{aligned}\Delta G[{\text{Mg}}^{2 + } ] & =\Delta G[1\;{\text{M}}\;{\text{Mg}}^{2 + } ] + (N - 1)\Delta g_{2}^{DNA} ; \\\Delta S[{\text{Mg}}^{2 + } ] & =\Delta S[1\;{\text{M}}\;{\text{Mg}}^{2 + } ] - 3.22(N - 1)\Delta g_{2}^{DNA} ; \\ 1/T_{m} [{\text{Mg}}^{2 + } ] & = 1/T_{m} [1\;{\text{M}}\;{\text{Mg}}^{2 + } ] - 0.00322(N - 1)\Delta g_{2}^{DNA} /\Delta H[1\;{\text{M}}\;{\text{Mg}}^{ + } ], \\ \end{aligned}$$
(11.12)

where \(\Delta g_{2}^{DNA}\) is given by

$$\begin{aligned}\Delta g_{2}^{DNA} & = a_{2}^{DNA} + b_{2}^{DNA} /N^{2} ; \\ a_{2}^{DNA} & = 0.02\ln [{\text{Mg}}^{2 + } ] + 0.0068\ln^{2} [{\text{Mg}}^{2 + } ]; \\ b_{2}^{DNA} & = 1.18\ln [{\text{Mg}}^{ 2+ } ] + 0.344\ln^{2} [{\text{Mg}}^{ 2+ } ]. \\ \end{aligned}$$
(11.13)

Through the above empirical formulas, the thermodynamics of a DNA helix at a given [Mg2+] can be calculated easily. The experimental comparisons show that the empirical formulas can make reliable estimates for the stability of a short DNA helix ranging from 6-bp to 30-bp at an arbitrary [Mg2+] [154].

Generally, a buffer contains both Na+ (or K+) and Mg2+ ions. The TBI theory also gives the empirical formulas for DNA helix thermodynamics in a mixed Na+/Mg2+ solution [155]

$$\begin{aligned}\Delta G & =\Delta G[1\;{\text{M}}\;{\text{Na}}^{ + } ] + (N - 1)\left( {x_{duplex}\Delta g_{1}^{DNA} + (1 - x_{duplex} )\Delta g_{2}^{DNA} } \right) +\Delta g_{12} ; \\\Delta S & =\Delta S[1\;{\text{M}}\;{\text{Na}}^{ + } ] - 3.22\left( {(N - 1)\left( {x_{duplex}\Delta g_{1}^{DNA} + (1 - x_{duplex} )\Delta g_{2}^{DNA} } \right) +\Delta g_{12} } \right); \\ 1/T_{m} & = 1/T_{m} [1\;{\text{M}}\;{\text{Na}}^{ + } ] - 0.00322\left( {(N - 1)\left( {x_{duplex}\Delta g_{1}^{DNA} + (1 - x_{duplex} )\Delta g_{2}^{DNA} } \right) +\Delta g_{12} } \right)/\Delta H[1\;{\text{M}}\;{\text{Na}}^{ + } ], \\ \end{aligned}$$
(11.14)

where x duplex stands for the contribution fraction from Na+, and Δg 12 is a crossing term. x duplex and Δg 12 are given by

$$\begin{aligned} x_{duplex} & = \frac{{[{\text{Na}}^{ + } ]}}{{\left( {[{\text{Na}}^{ + } ] + (8.1 - 32.4/N)(5.2 - \ln [{\text{Na}}^{ + } ])[{\text{Mg}}^{2 + } ]} \right)}}; \\\Delta g_{12} & = - 0.6x_{duplex} (1 - x_{duplex} )\ln [{\text{Na}}^{ + } ]\ln \left( {(1/x_{duplex} - 1)[{\text{Na}}^{ + } ]} \right)/N, \\ \end{aligned}$$
(11.15)

where [Na+] and [Mg2+] are both in molar. The comparisons with experimental data show that the above formulas give good estimate for the thermodynamics of a DNA helix in mixed Na+/Mg2+ solutions [155].

Thermodynamic parameters for RNA helix in Na + /Mg 2+ solution.

For an RNA helix in a Na+ solution, the thermodynamics can also be formulated by Eq. (11.10), except that \(\Delta g_{1}^{DNA}\) needs to be replaced by \(\Delta g_{1}^{RNA}\). \(\Delta g_{1}^{RNA}\) can be given by [155]

$$\begin{aligned}\Delta g_{1}^{RNA} & = a_{1}^{RNA} + b_{1}^{RNA} /N; \\ a_{1}^{RNA} & = - 0.075\ln [{\text{Na}}^{ + } ] + 0.012\ln^{2} [{\text{Na}}^{ + } ]; \\ b_{1}^{RNA} & = 0.018\ln^{2} [{\text{Na}}^{ + } ]. \\ \end{aligned}$$
(11.16)

The combination of Eq. (11.10) and Eq. (11.16) can give good estimate for an RNA helix in a Na+ solution, as shown in Ref [155].

Similarly, for an RNA helix in a Mg2+ solution, the thermodynamics can be described by Eq. (11.12), except that \(\Delta g_{2}^{DNA}\) needs to be changed into \(\Delta g_{2}^{RNA}\)

$$\begin{aligned}\Delta g_{2}^{RNA} & = a_{2}^{RNA} + b_{2}^{RNA} /N^{2} ; \\ a_{2}^{RNA} & = - 0.6/N + 0.025\ln [{\text{Mg}}^{2 + } ] + 0.0068\ln^{2} [{\text{Mg}}^{2 + } ]; \\ b_{2}^{RNA} & = \ln [{\text{Mg}}^{ 2+ } ] + 0.38\ln^{2} [{\text{Mg}}^{ 2+ } ]. \\ \end{aligned}$$
(11.17)

For an RNA helix in a mixed Na+/Mg2+ solution, the thermodynamics can be calculated by the following empirical formulas

$$\begin{aligned}\Delta G & =\Delta G[1\;{\text{M}}\;{\text{Na}}^{ + } ] + (N - 1)\left( {x_{duplex}\Delta g_{1}^{RNA} + (1 - x_{duplex} )\Delta g_{2}^{RNA} } \right) +\Delta g_{12} ; \\\Delta S & =\Delta S[1\;{\text{M}}\;{\text{Na}}^{ + } ] - 3.22\left( {(N - 1)\left( {x_{duplex}\Delta g_{1}^{RNA} + (1 - x_{duplex} )\Delta g_{2}^{RNA} } \right) +\Delta g_{12} } \right); \\ 1/T_{m} & = 1/T_{m} [1\;{\text{M}}\;{\text{Na}}^{ + } ] - 0.00322\left( {(N - 1)\left( {x_{duplex}\Delta g_{1}^{RNA} + (1 - x_{duplex} )\Delta g_{2}^{RNA} } \right) +\Delta g_{12} } \right)/\Delta H[1\;{\text{M}}\;{\text{Na}}^{ + } ], \\ \end{aligned}$$
(11.18)

where x duplex and \(\Delta g_{12}\) are given by Eq. (11.15). \(\Delta g_{1}^{RNA}\) and \(\Delta g_{2}^{RNA}\) are given by Eqs. (11.16 and 11.17), respectively. As shown in Ref. [155]. The predictions from Eq. (11.18) are quite reliable for the thermodynamics of an RNA helix in a mixed Na+/Mg2+ solution (Fig. 11.4).

4.3.3 Modelling Ion Effects in Stability of DNA/RNA Hairpins

An RNA/DNA hairpin consists of a helix stem and a hairpin loop. The Na+/Mg2+ dependence of a helix can be quantified by the empirical formulas described above. The TBI model can also gives the empirically analytical Na+/Mg2+-dependent thermodynamics for a single-stranded loop, with the combination with the virtual bond model for the single-stranded loop conformation [156].

For a loop formation in Na+ solutions, the systematic calculations of the TBI model give the following empirical relation for the folding free energy of an N-nt loop with end-to-end distance l [156]:

$$\Delta G[{\text{Na}}^{ + } ] = - k_{B} T\left( {a_{1} \ln (N - l/d + 1) + b_{1} (N - l/d + 1)^{2} - b_{1} - \left( {c_{1} N - d_{1} } \right)} \right),$$
(11.19)

where d = 6.4 Å. The coefficients a 1, b 1, c 1, and d 1 are given by

$$\begin{aligned} a_{1} & = (0.02N - 0.026)\ln [{\text{Na}}^{ + } ] + 0.54N + 0.78; \\ b_{1} & = ( - 0.01/(N + 1) + 0.006)\ln [{\text{Na}}^{ + } ] - 7/(N + 1)^{2} - 0.01; \\ c_{1} & = 0.07\ln [{\text{Na}}^{ + } ] + 1.8; \\ d_{1} & = 0.21\ln [{\text{Na}}^{ + } ] + 1.5. \\ \end{aligned}$$
(11.20)

For a loop in Mg2+ solutions, the empirical formulas from the TBI theory for the folding free energy can be written as

$$\Delta G[{\text{Mg}}^{2 + } ] = - k_{B} T\left( {a_{2} \ln (N - l/d + 1) + b_{2} (N - l/d + 1)^{2} - b_{2} - \left( {c_{2} N - d_{2} } \right)} \right),$$
(11.21)

where a 2, b 2, c 2, and d 2 are given by

$$\begin{aligned} a_{2} & = ( - 1/(N + 1) + 0.32)\ln [{\text{Mg}}^{2 + } ] + 0.7N + 0.43; \\ b_{2} & = 0.0002(N + 1)\ln [{\text{Mg}}^{2 + } ] - 5.9/(N + 1)^{2} - 0.003; \\ c_{2} & = 0.067\ln [{\text{Mg}}^{2 + } ] + 2.2; \\ d_{2} & = 0.163\ln [{\text{Mg}}^{2 + } ] + 2.53. \\ \end{aligned}$$
(11.22)

For a loop in mixed Na+/Mg2+ solutions, the folding free energy is represented by

$$\Delta G[{\text{Na}}^{ + } /{\text{Mg}}^{2 + } ] = x_{loop}\Delta G[{\text{Na}}^{ + } ] + (1 - x_{loop} )\Delta G[{\text{Mg}}^{2 + } ],$$
(11.23)

where x loop stands for the contribution fraction from Na+ and is given by

$$x_{loop} = \frac{{[{\text{Na}}^{ + } ]}}{{[{\text{Na}}^{ + } ] + (7.2 - 20/N)(40 - \ln [{\text{Na}}^{ + } ])[{\text{Mg}}^{2 + } ]}}.$$
(11.24)

With the use of the above formulas for loop formation, the folding thermodynamics of hairpin loop, bulge loop, internal loop in an arbitrary Na+/Mg2+ solution can be easily calculated [156].

For a hairpin loop, the Na+/Mg2+-dependent thermodynamics can be obtain by fixing the loop end-to-end distance at ~17 Å. Then the thermodynamics of an RNA (or DNA) hairpin can be calculated by the following formula [156]

$$\begin{aligned}\Delta G_{\text{hairpin}} & =\Delta H_{\text{stem}} - T\Delta S_{\text{stem}} +\Delta G_{{{\text{hairpin}}\;{\text{loop}}}} [{\text{Na}}^{ + } /{\text{Mg}}^{2 + } ]; \\\Delta H_{\text{stem}} & =\Delta H_{\text{stem}} [1\;{\text{M}}\;{\text{Na}}^{ + } ] +\Delta H_{{{\text{terminal}}\;{\text{mismatch}}}} ; \\\Delta S_{\text{stem}} & =\Delta S[{\text{Na}}^{ + } /{\text{Mg}}^{2 + } ] +\Delta S_{{{\text{terminal}}\;{\text{mismatch}}}} , \\ \end{aligned}$$
(11.25)

where ΔH stem, ΔH terminal mismatch, and ΔS terminal mismatch can be obtained from the nearest neighbour model with the measured thermodynamic parameters. ΔH stem[Na+/Mg2+] can be given by the previously introduced empirical formulas (Eq. 11.14 for DNA and Eq. 11.18 for RNA). The extensive experimental comparisons show that the empirical formulas can make rather reliable predictions on the hairpin stability in a Na+/Mg2+ solution; see Ref. [156].

In a very recent single-molecule experiment, the quantitative comparisons with the measurements on a 20-bp RNA hairpin show that, the above empirical formulas are rather accurate in describing the Na+/Mg2+-dependent thermodynamics for short RNA hairpin [146] (Fig. 11.4).

Fig. 11.4
figure 4

The Mg2+ and Na+ binding fractions per nucleotide for various RNA/DNA molecules. The solid lines are from the empirical formulas (Eqs. 11.26 and 11.28); and the symbols are experimental data. a 24-bp DNA duplex in [Mg2+] with fixed [Na+] = 20 mM [132]; b 40-bp RNA duplex. The experimental data are for poly(A.U) [133]: From the left to right, [Na+] = 10, 29, 60, and 100 mM, respectively; c 40-bp DNA duplex. The experimental data are for the calf thymus DNA [134]; d BWYV pseudoknot RNA [135]; e 58-nt rRNA fragment [136]. Please note that the experimental data are for mixed Mg2+/K+ (not Mg2+/Na+) solution: from left to right, [K+] = 20, 40, 60, and 150 mM, respectively. Here we show the experimental data for semi-quantitative comparisons. f Yeast tRNAPhe [137, 138]

4.3.4 Modelling Ion Binding to RNA Tertiary Structures

As described above (Sect. 11.4.1.1), ion binding is critical for stabilizing RNA folded structure. The TBI theory has been developed for a static atomistic structure, and can quantify the ion atmosphere around an RNA (or DNA) with complex 3D structure [177]. In a mixed Na+/Mg2+ solution, the binding’s of Na+ and Mg2+ are competitive and anti-cooperative. The TBI model has given an empirical equivalence relation between Mg2+ and Na+ as [177]

$$\log [{\text{Na}}^{ + } ]_{\text{Mg}} = A\log [{\text{Mg}}^{ 2+ } ] + B,$$
(11.26)

where [Na+] and [Mg2+] are both in millimolar (mM). A and B are two parameters depending on the (low-resolution) RNA (or DNA) structure

$$A = 0.65 + \frac{4.2}{N}\left( {\frac{{R_{g} }}{{R_{g}^{0} }}} \right)^{2} ;\quad B = 1.8 - \frac{9.8}{N}\left( {\frac{{R_{g} }}{{R_{g}^{0} }}} \right)^{2} ,$$
(11.27)

where N is the number of nucleotides of an RNA, and R g is the radius of gyration of the RNA (or DNA) backbone, and \(R_{g}^{0}\) is the radius of gyration of an N-nt RNA duplex.

Based on Eq. (11.23), the binding fractions of Na+ and Mg2+ can be calculated through [177]

$$f_{{{\text{Na}}^{ + } }} = \frac{{[{\text{Na}}^{ + } ]}}{{[{\text{Na}}^{ + } ] + [{\text{Na}}^{ + } ]_{\text{Mg}} }}f_{{{\text{Na}}^{ + } }}^{0} ;\quad f_{{{\text{Mg}}^{ + } }} = \frac{{[{\text{Na}}^{ + } ]_{\text{Mg}} }}{{[{\text{Na}}^{ + } ] + [{\text{Na}}^{ + } ]_{\text{Mg}} }}f_{{{\text{Mg}}^{2 + } }}^{0} ,$$
(11.28)

where [Na+]Mg is given by Eq. (11.20). \(f_{{{\text{Na}}^{ + } }}^{0}\) and \(f_{{{\text{Mg}}^{2 + } }}^{0}\) are the binding fractions for pure Na+ and pure Mg2+ solutions, respectively. Generally, \(f_{{{\text{Na}}^{ + } }}^{0} \approx 0.8\), and \(f_{{{\text{Mg}}^{2 + } }}^{0} \approx 0.47\) [177]. As shown in Fig. 11.4, the above empirical formulas could make reliable predictions for Na+/Mg2+ binding to RNAs/DNAs with complex 3D structures.

4.3.5 Modelling Salt Contribution to RNA Tertiary Structure Stability

Since RNA folding is hierarchical, the tertiary structure folding can be crudely modelled as a two-state transition from an intermediate (I) to the native (N) state. Similarly to the helix stability, the RNA tertiary folding free energy can be decoupled into two contributions: an electrostatic part and a nonelectrostatic part [178]

$$\begin{aligned}\Delta G & =\Delta G^{E} [{\text{Na}}^{ + } / {\text{Mg}}^{ 2+ } ] +\Delta G^{NE} ; \\ & =\Delta G^{E} [{\text{Na}}^{ + } / {\text{Mg}}^{ 2+ } ] + \left( {\Delta G[{\text{expt\;Na}}^{ + } ] -\Delta G^{E} [{\text{expt\;Na}}^{ + } ]} \right); \\ & =\Delta G[{\text{expt\;Na}}^{ + } ] + \left( {\Delta G^{E} [{\text{Na}}^{ + } / {\text{Mg}}^{ 2+ } ] -\Delta G^{E} [{\text{expt\;Na}}^{ + } ]} \right), \\ \end{aligned}$$
(11.29)

where \(\Delta G[{\text{expt\;Na}}^{ + } ]\) is the experimental folding free energy at a reference ion condition. \(\Delta G^{E}\) can be given by the empirical formulas derived from the TBI model [178].

For an RNA folding in Na+ solutions, \(\Delta G^{E}\) can be calculated by the following empirical formula

$$\Delta G^{E} [{\text{Na}}^{ + } ] =\Delta G^{E} [1\;{\text{M}}\;{\text{Na}}^{ + } ] + a_{1} N\ln [{\text{Na}}^{ + } ] + b_{1} N\ln^{2} [{\text{Na}}^{ + } ],$$
(11.30)

where a 1 and b 1 are the parameters related to the RNA folded structure. a 1 and b 1 can be formulated by [178]

$$\begin{aligned} a_{1} \times \varepsilon^{*} (T)T^{*} & = - 0.086 + 7/(Nr_{g}^{3} + 65); \\ b_{1} \times \varepsilon^{*} (T)T^{*} & = 0.008 - 3.6/(N - 5)^{2} , \\ \end{aligned}$$
(11.31)

where \(r_{g} = R_{g}^{0} /R_{g}\). ε*(T) = ε (T)/ε (298.15 K) is the relative dielectric constant, and T* = T/298.15 is the relative temperature.

For an RNA folding in Mg2+ solutions, the TBI model gives the following empirical formula for \(\Delta G^{E}\) [178]

$$\Delta G^{E} [{\text{Mg}}^{2 + } ] =\Delta G^{E} [1\;{\text{M}}\;{\text{Mg}}^{2 + } ] + a_{2} N\ln [{\text{Mg}}^{2 + } ] + b_{2} N\ln^{2} [{\text{Mg}}^{2 + } ] + c_{2} NT^{*} ,$$
(11.32)

a 2, b 2, c 2 are given by

$$\begin{aligned} a_{2} \times \varepsilon^{*} (T)T^{*} & = 0.012 - 1.4/(Nr_{g}^{3} + 75); \\ b_{2} \times \varepsilon^{*} (T)T^{*} & = 0.0048 - 57/(Nr_{g}^{3} + N + 75)(N + 75); \\ c_{2} \times \varepsilon^{*} (T)T^{*} & = - 0.27 + 0.16/r_{g}^{3} + 1.4/N. \\ \end{aligned}$$
(11.33)

For RNA folding in a mixed Na+/Mg2+ solution, \(\Delta G^{E}\) is given by the empirical relation

$$\Delta G^{E} [{\text{Na}}^{ + } / {\text{Mg}}^{2 + } ] = x_{{3^{o} }}\Delta G^{E} [{\text{Na}}^{ + } ] + (1 - x_{{3^{o} }} )\Delta G^{E} [{\text{Mg}}^{2 + } ] + N\Delta g_{12} ,$$
(11.34)

where \(x_{{3^{o} }}\) denotes the contribution fraction from Na+, and Δg 12 is a crossing term. x and Δg 12 are given by

$$\begin{aligned} x_{{3^{o} }} & = \frac{{[{\text{Na}}^{ + } ]}}{{[{\text{Na}}^{ + } ] + \left( {3.8 - 34/(N - 20)r_{g}^{3} } \right)\left( {1 + 0.2[{\text{Na}}^{ + } ]} \right)[{\text{Mg}}^{2 + } ]^{0.64} }}; \\\Delta g_{12} & = - x_{{3^{o} }} (1 - x_{{3^{o} }} )(0.26 - 1.2/(N - 20)). \\ \end{aligned}$$
(11.35)

With the use of the above empirical formulas for ΔGE and Eq. (11.29), the Na+/Mg2+-dependent RNA tertiary folding thermodynamics can be conveniently calculated. Figure 11.5 shows that Eq. (11.34) gives good estimates for the Mg2+-contribution to the total folding stability \(\Delta \Delta G_{{{\text{Mg}}^{ 2+ } }}^{E} =\Delta G_{{{\text{Na}}^{ + } , {\text{Mg}}^{ 2+ } }}^{E} -\Delta G_{{{\text{Na}}^{ + } , {\text{Mg}}^{ 2+ } = 0}}^{E}\), as compared with the experimental data. It is also shown that Eq. (11.29) with the empirical formulas for ΔGE (Eqs. 11.3011.35) can make good evaluation for the Na+/Mg2+-dependent folding thermodynamics of small RNAs [178].

Fig. 11.5
figure 5

The Mg2+-contribution \(\Delta \Delta G_{{{\text{Mg}}^{ 2+ } }}^{E}\) to RNA tertiary structure folding free energy as a function of [Mg2+] for three RNA molecules: BWYV pseudoknot (a), 58-nt ribosomal RNA fragment (b), and yeast tRNAPhe (c) at room temperature. Solid lines, empirical formulas derived from the TBI model; symbols, experimental data: a BWYV pseudoknot in 54 and 79 mM Na+ solution [135]; b 58-nt rRNA fragment in solution with 1.6 M monovalent ions [178, 197]; c yeast tRNAPhe in solution with 32 mM Na+ [138, 179, 197]

5 Perspectives

Although many RNA 3D structure modelling methods have proposed, further developments and refinements of the existing models are still required. Current algorithms have shown how the use of available experimental data can dramatically improve the structure prediction, e.g. the discrete molecular dynamics simulations with the use of HRP measurements can predict structure of RNAs ranging in size from 80 to 230 nucleotides [26]. In addition, several algorithms have exploited the hierarchical properties of RNA folding [3638], and consequently the prediction accuracy can be improved by adding the knowledge of secondary structure and tertiary contacts from experiments to the existed programs. However, there are still some essential problems remaining challenging, including: (1) Could the structures for larger RNA molecules be predicted reliably and efficiently? (2) Could RNA structures be predicted versus different environments (temperature, ion conditions, etc.)?

Despite the significant progress, modelling of RNA folding dynamics remains a challenging problem. The current form of the theories involves several limitations. First, the theories do not treat folding/unfolding of tertiary folds such as pseudoknots. Second, the theories cannot treat, at the explicitly atomistic level, the effects of cofactors such as magnesium ions, ligands and proteins. In the future, we expect that the RNA folding kinetic theories can overcome these limitations and will be applicable to design RNA and DNA molecules with particular dynamic properties, which is of great importance in the emerging fields of synthetic biology and nucleic acid-based nanotechnology.

The extensive investigations have significantly enhanced the qualitative/quantitative understanding on ion effects in RNA folding. However, the quantitative understanding on ion roles is still challenging at least in the following issues: (1) How are the specific properties of ions correlated to their specific roles in RNA folding? (2) Is the efficient role of multivalent ions come from the inter-ion Coulomic correlation? (3) What are the roles of the specific-site binding of Mg2+ in RNA tertiary binding [212]? More issues related to RNA ion electrostatics includes: (1) What are the role of ions in RNA-ligand interaction? (2) What is the role of ions in RNA-protein interaction? To answer the questions requires the further development of theoretical modelling [209211], combined with the progress in experiments.

Most above introduced progress in RNA folding problem were obtained for in vitro systems, while in cells, RNAs are surrounded by many other macromolecules. Therefore, in reality, RNA folds in a possibly interactive and dynamic confined space [213216]. Limited existed investigations have revealed that the spatial confinement may significantly influence the folded structure and the ion role in folding [179, 214216]. Further investigations on RNA folding should also involve the complex effects from the other macromolecules in vivo.