Chemical reaction optimization for RNA structure prediction

Kabir, Rayhanul; Islam, Rafiqul

doi:10.1007/s10489-018-1281-4

Chemical reaction optimization for RNA structure prediction

Published: 30 August 2018

Volume 49, pages 352–375, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Chemical reaction optimization for RNA structure prediction

Download PDF

358 Accesses
19 Citations
Explore all metrics

Abstract

RNA Structure Prediction (RSP) is an optimization problem, where a stable secondary structure is obtained from an RNA primary sequence. To solve the RSP problem, many exact and metaheuristic algorithms were established in recent years. We have proposed an approach based on metaheuristic algorithm named Chemical Reaction Optimization (CRO) to solve the RSP problem. CRO is a population-based metaheuristic which has been employed in different optimization problems and works better than all other related existing algorithms. We have redesigned the reaction operators of CRO algorithm and calculated the minimum free energy of the RNA structure to solve RSP problem. The operators spread out the population entirely on the solution space using both local and global searches and find the better structure, which makes the proposed algorithm more unique. We have designed a novel operator called Repair function to verify and remove the repeated stem from the solution of an RNA sequence, which makes the process more time efficient. Both the quality of solutions and execution time are considered in designing the basic operators and the repair function. Thus, the proposed methodology gives robustness, efficiency, and effectiveness in solving the problem. The results of the proposed CRO based algorithm for RSP problem are compared with genetic algorithm (RNAPredict), simulated annealing algorithm (SARNA-Predict), coincidence algorithm (COIN), two-level particle swarm optimization algorithm (TL-PSOfold) and Changing Range Bat Algorithm (CRBA) to present that, the proposed work gives better results than those. The significance testing using Kruskal-Wallis test followed by post-hoc analysis also proves that the proposed work outperforms the five related methods.

RNA Structure Prediction Using Chemical Reaction Optimization

Acceleration based Particle Swarm Optimization (APSO) for RNA Secondary Structure Prediction

RNA Secondary Structure an Overview

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

RNA has a fundamental role in protein synthesis and it is also important in genetic and evolution process. RNA is a macromolecule made from the nucleotide sequence. Four types of nucleotides are possible in RNA: adenine (A), uracil (U), cytosine (C), and guanine (G). Usually, RNA is single-stranded but stable double helix structures of correspondent strands can be formed. RNA bends and folds back to form hydrogen bonds between correspondent nucleotides and builds base pairs. The canonical base pairs in RNA are the stable Watson-Crick base pairs A-U and C-G, and the less stable ‘wobble’ pair G-U [1]. In protein synthesis, three categories of RNA are used: messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). The mRNA contains the information needed for protein synthesis, while the rRNA is a ribosome component and tRNA transfers amino acids to the ribosome as basic materials for protein synthesis [2].

RNA has three types of structure; the nucleotide sequence is the primary structure. The secondary structure is the bonded base pairs in a two-dimensional way. RNA molecules in 3D space are called tertiary structure. RNA structure prediction is a process to calculate possible legal stems and select some of them to obtain an optimal result by constructing the secondary structure of RNA. At present, determining the structure of RNA has become the target of many researchers as it is one of the main issues in inventing new drugs and finding out the genetic diseases. Determining secondary structure is the first action in predicting the 3-D structure of RNA and interpretation of the biological function of the RNA molecules. The secondary structure provides numerous information about molecule structure. X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy are also used to obtain the secondary structure. These processes are difficult, slow and costly. On that account, it is needed to implement mathematical and computational methods to solve the RSP problem.

Dynamic Programming (DP) was the first approach to solving RSP problem based on the minimization of the free energy and successfully predicts the secondary structure of the small pseudoknot-free RNA molecule [3]. DP based mfold was developed by Zuker [4] for inventing the pseudoknot-free RNA secondary structure. Thermodynamic models were used to calculate the free energy of an RNA secondary structure [5].

The use of Simulated Annealing (SA) for predicting RNA secondary structures was first described by Schmitz and Steger [6]. In this method, iterative formation and separation of the single base pair through SA is used. Tsang and Grypma presented a permutation-based algorithm SARNA-Predict, for RSP problem which is based on simulated annealing [2]. They studied the synchronism behavior of the algorithm. The method requires a process of removing conflicted helix to correct the structure. This process is quite tedious especially in a sequence that has a large number of possible helices. N. McMillan investigated a solution for RSP problem using Ant Colony Optimization (ACO) [3]. Initially, all feasible stems are recognized using a brute-force algorithm for a particular RNA sequence. Then, using ACO, new stems are summed up probabilistically by an ant to build a possible secondary structure. The procedure is repeated for a number of ants and also for a particular iteration, the pheromone trails for all the structures are improved depending on the ideal ant, in the matter of minimum free energy.

Particle Swarm Optimization (PSO) is known as useful in solving many different types of optimization problems and known for being able to find out the global optimal outcomes in the solution space [7]. Several methods were introduced based on the PSO for the RSP problem. HelixPSO was introduced for finding RNA secondary structures with minimum free energy by Geis and Middendorf [8]. Another approach is set-based PSO to optimize the RNA molecule structure using an advanced thermodynamic model which was proposed by Neethling and Engelbrecht [9]. An improved PSO (IPSO) model was presented by Liu [10]. The authors designed an objective function according to the number of selected stems, the average length of selected stems and the minimum free energy. Another author, Xing, introduced PSOfold based on improved PSO. An adaptive parameter controller of PSO is applied to promote the balance between exploration and exploitation [11, 12]. In PSOfold, some crucial stems were ignored while predicting secondary structure. Genetic Algorithm (GA) based method named RNAPredict was proposed by Wiese using thermodynamic models [7]. In this method, a set of feasible helices is produced from a given RNA sequence with helix generation algorithm [1]. A permutation of the helix numbers represents each chromosome. If two or more helices in any chromosome share some common base pairs, then the conflict helices are not added to the RNA fold. After that crossover and mutation operators are used to improve the chromosome to find a minimum free energy finally [13]. For the sequences with a large number of helices, this method is time-consuming. For global optimization, another metaheuristic algorithm is known as the Bat algorithm. Changing Range Bat Algorithm (CRBA) was introduced for finding RNA Secondary Structure Prediction by Zhihua, Li, Cao and Zhu [32]. This paper represents an updated version of Bat algorithm. Bat algorithm describes microbats echolocation behavior. To predict RNA secondary structure CRBA showed the prediction for ten shorter sequences. And the results were compared with mfold.

In this paper, we have proposed an algorithm based on the Chemical Reaction Optimization (CRO) to solve the RNA structure prediction problem. Our main target is to find out the most stable secondary structure of an RNA sequence. The chemical reaction is a process involving the rearrangement of the molecular and ionic structure of substances. There is a common nature of this universe that every molecule or ion that is not in the stable state wants to be stable by chemical reaction. The CRO algorithm follows this exact behavior. The important feature of CRO is its searching ability. It has both local and global search properties. Another feature of CRO is the high flexibility of designing reaction operators and population generation. These two features help CRO to fit for any optimization problem and finding out the global best results of the optimization problems. In recent years CRO has successfully solved many optimization problems and showed better results than other meta-heuristics approaches. To solve 0-1 knapsack problem, the CRO with Greedy strategy showed a better result than ant colony optimization, genetic algorithm and quantum-inspired evolutionary algorithm [14]. CRO was also applied to the quadratic assignment problem [15], cognitive radio spectrum allocation problem [16], and network coding optimization [17]. To solve multiple choice 0-1 knapsack problem, the artificial CRO outperforms the genetic algorithm [18].

Contribution and novelty of this work are summarized below.

1.
We have redesigned four reaction operators: On-wall ineffective collision, Decomposition, Inter-molecular ineffective collision and Synthesis to find the global optimal point. These operators make the whole process able to search best structures of RNA sequences. On the other hand, the proposed method gives the most stable structures for both shorter and longer sequences because of these operators.
2.
A novel efficient approach has been proposed here. Chemical Reaction Optimization (CRO) algorithm has been successfully applied to solve different NP-hard problems, however it was not used or proposed for solving RNA structure prediction problem.
3.
A new solution generation process has been introduced for CRO Algorithm. The process is efficient in generating valid solution for RNA structure prediction problem.
4.
Repair function is one of our other novel tasks. With the help of repair function we verify and remove the duplicate stem number(s). While a solution is generated by any of the four basic operators of CRO, if there exist duplicates of stem numbers during the construction process of secondary structure then duplicate stem numbers are taken into consideration repeatedly, consequently the process takes lot a time. So the repair function makes the CRO algorithm robust and time efficient.
5.
Our proposed work follows the hydrogen bond model (INN-HB) which is a group of thermodynamics model to estimate minimum free energy (MFE). This model is subtle and easy to implement and takes less time to obtain minimum free energy than all other models. These properties of the proposed work make it possible to give better results than other methods. The outcomes of the proposed work are compared with the previous related methods such as RNAPredict (GA) [7], the SARNA-Predict (SA) [2], COIN [13], TL-PSOfold [19] and CRBA [32] to show the performance of the proposed method.

2 Problem statement

The RSP is a problem to anticipate RNA secondary structure. Here, an RNA sequence is given to compute the correct secondary structure. The secondary structure of RNA is described by a list of base pairs formed from the primary sequence.

Let = S = s₁,s₂,...,s_n be an RNA sequence. Here S is a string of alphabet $ \sum = \{a, u, g, c\}$. A pair (x, y) is called a base pair (complimentary) if {x, y} = {a, u} or {x, y} = {g, c}. Pairs like {a, g}, {c, u}, {a, c} are not treated as a base pair [20]. The most stable and common of these base pairs are {g, c}, {a, u}, and {g, u}, and their opposite, {c, g}, {u, a}, and {u, g}. When all the pairs are built, the RNA strand folds back to produce the secondary structure. Our main objective is to maximize the number of the stem to create RNA secondary structure from a given sequence and select the most stable secondary structure. The stability of a structure depends on the Gibbs free energy(ΔG). The structure with minimum energy is accepted.(ΔG) is used to calculate the total energy of different structures (RNA) of the same sequence. We use the individual nearest-neighbor hydrogen bond model (INN-HB) [21] for calculating the free energy of a helix in the RNA secondary structure. Now we define an objective function for the RNA structure prediction.

$$ \begin{array}{llll} R=min\{{\Delta} G_{i}\}; & where 1\leq i \leq n; n \\ &= \mathit{number} \mathit{of} \mathit{secondary} \mathit{structure} \\ & \mathit{for} \mathit{one} \mathit{sequence} \end{array} $$

(1)

$$ \begin{array}{lllll} {\Delta} G_{37}^{\circ} ={\Delta} G_{37init}^{\circ} + \sum &\left[{\Delta} G_{37NN}^{\circ}\right] \\&+ {\Delta} G_{\mathit{37AU/GUend}}^{\circ} (\mathit{per AU / GU end}) \\&+ {\Delta} G_{37sym}^{\circ} \end{array} $$

(2)

This Eq. (2) is widely used to calculate the fitness of the RNA secondary structure. The values of these parameters are taken based on [22]. In Table 1, the meaning of every symbol is given.

Table 1 Symbol table for (2)

Chemical reaction optimization for RNA structure prediction

Abstract

Similar content being viewed by others

RNA Structure Prediction Using Chemical Reaction Optimization

Acceleration based Particle Swarm Optimization (APSO) for RNA Secondary Structure Prediction

RNA Secondary Structure an Overview

Explore related subjects

1 Introduction

2 Problem statement

3 Related works

3.1 Dynamic programming

3.2 Simulated annealing

3.3 Genetic algorithm

3.4 Two-level particle swarm optimization algorithm

3.5 Coincidence algorithm

3.6 Changing range bat algorithm

3.7 Advances and assessment of 3D structure prediction

4 Chemical reaction optimization

4.1 Algorithm design

4.1.1 Initialization and population generation

4.1.2 Iteration and operator design

On-wall ineffective collision

Repair function

Decomposition

Inter-molecular Ineffective Collision

Synthesis

4.2 Parameter settings

4.3 Operator selection

5 Experimental results

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation