Biological complexity: ant colony meta-heuristic optimization algorithm for protein folding

Kaushik, Aman Chandra; Sahi, Shakti

doi:10.1007/s00521-016-2252-5

Biological complexity: ant colony meta-heuristic optimization algorithm for protein folding

Original Article
Published: 07 March 2016

Volume 28, pages 3385–3391, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Biological complexity: ant colony meta-heuristic optimization algorithm for protein folding

Download PDF

Aman Chandra Kaushik¹ &
Shakti Sahi¹

370 Accesses
10 Citations
Explore all metrics

Abstract

Ant colony meta-heuristic optimization (ACO) is one of the few algorithms that can help to gain an atomic level insight into the conformation of protein folding states, intermediate weights and pheromones present along the protein folding pathway. These are analysed by nodes (amino acids), and these nodes depend upon the probability of next optimized node (amino acids). Nodes have conformational degrees of freedom as well as depend upon the natural factors and collective behaviour of biologically important molecules like temperature, volume, pressure and other ensembles. This biological quantum complexity can be resolved using ACO algorithm. Ants are visually blind and important behaviour of communication among individuals or colony of ant environment is based on chemicals (pheromones) deposited by the ants. Just like ants, proteins are also a group of colony; amino acids are node (amino acid) attached to each others with the help of bonds. This paper is aimed to determine the factors affecting protein folding pattern using ant colony algorithm. Protein occurs structurally in a compact form and determining the ways of protein folding is called NP hard (non-deterministic polynomial-time hard) problem. Using the ACO, we have developed an algorithm for protein folding. It is interesting to note that based on ants ability to find new shorter path between the nest and the food, proteins can also be optimized for shorter path between one node to another node and the folding pattern can be predicted for an unknown protein (ab initio). We have developed an application based on ACO in Perl language (PFEBRT) for determining optimized folding path of proteins.

Protein tertiary structure prediction by integrating ant colony optimization with path relinking and structure knowledge

Article 07 March 2023

New Ant Colony Optimization Including Weights

The 2D and 3D Protein Sequencing Implementation Using Ant Colony Optimization Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Various kinds of optimization algorithms have been implemented to tackle folding in homology modelling, threading and ab initio based on artificial intelligence and hybrid approaches. Protein folding is non-deterministic polynomial-time hard problem for identifying protein conformation and folding process. Ant colony optimization (ACO) is used in various problems like routing vehicles, dynamic problems, stochastic problems, multi-target implementation, multi-target parallel implementations, software testing, travelling salesman problem and protein folding. Ant colony meta-heuristic optimization is one of the few algorithms that can help to gain an atomic level insight into the conformation of protein folding states and its intermediate weights and pheromone present along the protein folding. This ant colony optimization algorithm is inspired by research on the behaviour of real ant colonies. Ant colonies are distributed systems which perform complex tasks for finding food in an optimized way. Ant algorithms are derived from the observation of real ant’s behaviour and optimized for distributed control problems [1]. The different aspects of the behaviour of ant colonies are useful in solving computational protein folding problems. The main logic behind ant colony algorithms is to solve the NP hard problem of protein folding mechanism [2]. The ants exhibit a complex behavioural pattern which helps them to find optimized shorter path between the nest and the food. Ants are visually impaired and an important aspect of their communication is based on chemicals released and deposited (pheromones) by them [3]. They tend to choose paths marked by strong pheromone concentration (pheromone trail) for food. Proteins may also be considered as a colony where amino acids are nodes attached to each other with the help of peptide bonds. Protein structure folds in a compact form. Using ACO behaviour, researchers can identify the folding pattern of proteins [4]. Similar to the behavioural pattern exhibited by ants, proteins can also be optimized to have a shorter connecting path between one amino acid to another amino acid [5]. An interesting experiment was conducted on ants involving a double bridge connecting the ant nest to food source to study the pheromone trail laying and following behaviour. Initially, only the long branch was opened to ants and later the short branch was also offered. Since the starting pheromone concentration was high on the long branch with slow evaporation of pheromone, so majority of ants always chose the long branch even after the appearance of shorter branch [6, 7]. Similarly, in protein structure starting amino acid bonding is very strong on the long branch with standard distance factors using general AMBER force field (GAFF), and other factors like temperature, pressure, volume, degree of freedom and total time of protein folding. With these ensembles, proteins are also dependent on standard distance factors; when distance factors are changed in proteins, it causes random or specific mutation in proteins resulting in diseases.

1.1 Monte Carlo simulation

The Monte Carlo simulation (MC) algorithms is based on the energy distribution for a given protein temperature and executes temperature simulation for protein folding pattern [8–10]. MC algorithms are based on conformational states and their searching efficiency has been enhanced in the Basin Hopping approach, which couples large step Monte Carlo jumps with gradient-driven local minimization [11, 12]. After searching the final energy distribution, it modifies the transition probability to accelerate the transition between different states [13]. Monte Carlo minimization has been successfully applied to the conformational searching in protein folding pattern [2] by executing the local energy minimization of each trajectory of protein folding pattern [14]. Monte Carlo approach allows efficient exploration of protein conformation and is comparable with genetic algorithm and other heuristic approaches.

1.2 Molecular dynamics

Molecular dynamics (MD) simulation is based on Newton’s equations of motion. It monitors atom movements during protein folding pattern [15] and is one of the most useful methods for known biological complexity of protein folding problem [16, 17]. The long MD simulation is a major limiting factor as the incremental timescale is in the order of femtoseconds, while the fastest protein folding pattern timing of a small protein less than 100 residues is in the millisecond range [18–22]. There are many softwares for MD simulation which are also used for the structure refinements of low-resolution model [23, 24]. Number of parameters like implemented torsions, ensembles and coarse-grained energy functions are used for refinement [25, 26]. Molecular dynamics simulation is based on Newton’s equations of motion to all atoms concurrently over a small time step to conclude new atomic positions and velocities. In cases of Monte Carlo (MC) and molecular dynamics (MD), the force field controls the total energy, which concludes the evolution of the systems. Molecular dynamics simulation is proven to be a powerful approach for studying protein dynamics.

1.3 Genetic algorithm

Conformational space annealing is based on one of the genetic algorithms (GAs) for protein folding pattern. Using GA folding pattern of the proteins can be identified [27, 28]. The MC algorithms are based on local minima, searches whole conformation of the protein and generate low-energy conformation, while conformational space annealing applies various global optimizations, searches whole conformation of the protein and generates low energy for folding pattern of the protein [29, 30]. Conformational space annealing has been successfully applied to ab initio modelling of the protein.

2 Materials and methods

The flow chart (Fig. 1) represents the movement of ant (protein) based on two parameters A (heuristics as a force fields of amino acid to other amino acids) and B (pheromones as a distance of amino acid to other amino acids).

2.1 Ant colony algorithms

1.
Implementation The methodology for ant colony optimization algorithm implementation for the protein folding pattern, which is NP hard problem [31], and generation of various conformational states using ACO [32–37] is shown in Fig. 1. A and B are two parameters where A (heuristics as a force fields of amino acid to other amino acids) and B (pheromones as a distance of amino acid to other amino acids) determine the relative influence of the force fields (taken from the GAFF) or distance factors (taken from the GAFF) [38]. These factors are responsible for the trail of amino acids and the heuristic information on the path traversed by amino acids. Initialization (I) is initial distance factor and heuristic value deposited on each amino acid in proteins, respectively. Initially, (I) is set equal to the depth of proteins and I is set equal to the number of decision node (amino acid). This variable specifies the count status of node (amino acid) by the amino acids k, initially set to 0. N ^k_i is the feasible neighbourhood of amino acids k when being at node i, initially set to zero. Key = End node in proteins, Pframe = [Total no of node (amino acids in the protein)]. Pframe = [1, 2, 3, 4, 5…EndAA], NC = total node sequence covered up to now, calculation of depth of proteins using algorithm ACO_ DEPTH. Initialization Init accordingly and determination of number of decision node in proteins and setting up the initial value. When expanded a fractional conformation I _k…I _i to I _i+1 during the edifice phase of ant colony optimization algorithms, next to kin direction d of I _i+1 to I _i−1 is resolved based on heuristic (force fields K _ij) and pheromone (distance of amino acids r _ij) values according to following probabilities $P_{i} d = \left( {T^{\alpha }_{xy} } \right)\left( {\eta^{\beta }_{xy} } \right)/\sum_{y} \in {\text{allowed}}_{y} \left( {T^{\alpha }_{xy} } \right)\left( {\eta^{\beta }_{xy} } \right).$
2.
Application development We developed Perl application, for validation of proposed algorithm and also to find out protein folding pattern of the protein using ACO algorithm. In this application by selecting PFP button of the application, the desired protein in PDB format can be selected. This would generate coordinates of given PDB file and save it into analysis file folder. This output can be used to investigate the optimized folding pattern of the given PDB file.

3 Results and discussion

We report ACO algorithm for protein (including membrane protein) folding prediction. We have designed new algorithm for protein conformation and protein folding. The algorithm traverses each node and prioritizes the path according to the path strength. Paths having the standard distance factors strength are given the highest priority for testing followed by next lower standard distance factors strength. This algorithm finds optimized path for protein folding (native conformations) within nanoseconds of CPU. The protein with the PDB code: 4BEY (night blindness causing G90D rhodopsin) was used as an example (Fig. 1) to prioritize the various conformation states using ant colony optimization algorithms. The algorithm was applied as follows.

GAFF (general AMBER force field) is well matched to the AMBER force fields, GAFF is appropriate to study range of molecules, and generally, all the organic molecules are made of C, H, S, O, N, P, F, Br, Cl and I. The interaction force fields (K _ij) parameter between one molecule to another molecule and interaction distance (r _ij) parameter between one molecule to another molecule are described in supplementary Table 1. AMBER and GAFF force fields have been reported to work well in case of drug designing, biological molecules and organic molecules. GAFF is more compatible for rational drug designing and applies harmonic function form as following

$$\begin{aligned} E_{\text{Pair}} & = \sum\limits_{\text{Bonds}} {K_{r} \left( {r - r_{\text{eq}} } \right)^{2} + \sum\limits_{\text{Angles}} {K_{\theta } \left( {\theta - \theta_{\text{eq}} } \right)^{2} } } + \sum\limits_{\text{Dihedrals}} {V_{n} /2\left[ {1 + {\text{COS}}\left( {n\varphi - \varUpsilon } \right)} \right]} \\ & \quad + \sum\limits_{i < j} {\left[ {A_{ij} /R_{ij}^{12} - B_{ij} /R_{ij}^{6} + q_{i} q_{j} /R_{ij} } \right]} . \\ \end{aligned}$$

3.1 ACO algorithms implementation for identification of protein folding pattern using standard distance and generalized AMBER force fields (GAFF) parameters

1.
Firstly, the amino acids starting node A and its neighbourhood N ^k_s = 1 was defined. Heuristic was known from the generalized AMBER force fields. One per cent distance factors is evaporated from the node covered up to now, i.e. node A − 1 initially has approximately 1 distance factor value which after evaporation is left to 0.082. Optimized path covered after finding each next node to be visited upon is calculated and thus its length, i.e. A − 1 the optimized path is A − 1 where length is equal to 1. The next node to be moved upon is node 1.
2.
As the next node G to be moved upon is not equal to key node (i.e. end node), the amino acids starting node are now node 1 and the above steps are again followed.
3.
Applied random proportional rule to decide the next node of ant, probabilities of visiting the ant from node to node, the process is repeated till the ant reaches the destination node END_AA.
4.
As node covered up to now = i which is not equal to Pframe = {M, C, G, T, E), the amino acid again starts from the start node i.
5.
As node covered up to now = (M, C, G, T, E). The ant had now covered one full path starting from node 1 to node END_AA. The amount of distance factors deposited on each edge in the traversed path up to now is calculated, and the net pheromone is updated on each edge in the path equal to amount left after evaporation of distance factors + amount deposited after path traversed.
6.
The change in heuristic (GAFF) is calculated, i.e. Δ n ^j_i = n _ij/C ^k_i .
7.
As node covered until now = (M, C, G, T, E) which is not equal to Pframe = (M, C, G, T, E), the ant again starts from the start node i.
8.
The process is continued till all nodes are covered.
9.
When node covered = Pframe, the strength of each path was calculated, for example, for the path (M, C, G, T, E), and the strength is calculated as I − 1 = (final pheromone value) × (final heuristic value). Similarly, for the others edges in the path we calculate edge strengths. Finally, adding all edge’s strength in the path gives the final value for the path strength (Table 1)
Table 1 Probability-based result summary of different moves of an ant path, here we use PDB:4BEY [39] PDB file (night blindness causing G90D rhodopsin) for calculation of protein folding pattern using ant colony optimization algorithms
Full size table

Further, the algorithm using was compared and validated using PDB:4BEY protein. There was a good correlation in the results of observed and the experimental results. Figure 2a represents the folding pattern priority on the basis of nearest distance and favourable force fields using ant colony optimization (ACO) where different colours represent amino acids of small peptide (PDB:4BEY) and their M to E movement. The Fig. 2b represents the folding pattern of PDB:4BEY on the basis of nearest distance and favourable force fields using ant colony optimization (ACO) as in 2a in graphical form, where blue colour represents MET (node 1), cyan colour represents CYS (node 2), green colour represents GLY (node 3), orange colour represents THR (node 4), and red colour represents GLU (node 5) amino acids. The best optimized and prioritized folding paths are given in Table 2 covering all the residues of PDB:4BEY.

Table 2 Independent path’s strength versus priority

Full size table

The final path according to the total path strength and priority is shown in Table 2. The path having the maximum combined strength of pheromone and heuristic has been given the highest priority.

A Perl-based application protein folding energy-based recognition tool (PFEBRT) was developed as shown in Fig. 3a, b to determine the protein folding pattern of the protein using ant colony optimization algorithms. With this, users can easily select desired protein in PDB format and generate coordinates of given PDB file and save it into analysed file folder. In this work, a crucial role is played by CPU time. The maximum number of local minima search of sequences using ant colony algorithm is improved with respect to time and minimum number of CPU usage.

4 Conclusion

We have developed an ant colony-based algorithm and implemented in Perl language for protein folding. Using this application, the conformation of protein folding states, intermediates weights and pheromone present along the protein folding can be determined using amino acids interaction, ensembles and force fields. Protein folding is non-deterministic polynomial-time hard (NP hard) problem, using PDB files identification of protein conformation and folding process can be done. The ant colony meta-heuristic optimization is one of the few algorithms that can help to gain an atomic level insight into the conformation of protein folding states and intermediates weights and pheromone present along the protein folding. It finds a more optimized path of folding states. Development of ACO algorithms is more realistic and CPU-based models for protein structure folding pattern, i.e. NP hard problem.

References

Dorigo M (2005) Ant colony optimization theory: a survey. Elsevier 344:243–278
MathSciNet MATH Google Scholar
Kuwajima K (1989) The molten globule state as a clue for understanding the folding and cooperativity of globular-protein structure. Proteins Struct Funct Genet 6:87–103
Article Google Scholar
Dorigo M (1996) The ant system: optimization by a colony of cooperating agents. IEEE Trans 26:29–41
Google Scholar
Dorigo M (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evolu Comput 1:53–66
Article Google Scholar
Dorigo M (1997) Ant colonies for the traveling salesman problem. Bio-Systems 43:73–81
Article Google Scholar
Dorigo M, Maniezzo V, Colorni A (1991) Positive feedback as a search strategy. Tech rep., pp 91–116
Dorigo M, Di Caro G (1999) New ideas in optimization. In: Corne D, Dorigo M, Glover F (eds) New ideas in optimization. McGraw-Hill, New York, pp 63–76
Google Scholar
Bastolla U, Fravenkron H, Gestner E, Grassberger P, Nadler W (1998) Testing a New Monte Carlo algorithm for the protein folding problem. Proteins 32:52–66
Article Google Scholar
Georgopoulos C, Liberek K, Zylicz M, Ang D (1994) Heat-shock proteins in biology and medicine. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, pp 209–249
Google Scholar
O’Toole EM, Panagiotopoulos AZ (1992) Monte Carlo simulation of folding transitions of simple model proteins using a chain growth algorithm. J Chem Phys 97:8644–8652
Article Google Scholar
Ramakrishnan R, Ramachandran B, Pekny JF (1997) A dynamic Monte Carlo algorithm for exploration of dense conformational spaces in heteropolymers. J Chem Phys 106:2418–2424
Article Google Scholar
Irback A (1998) Monte Carlo approach to biopolymers and protein folding. World Scientific, Singapore, pp 98–109
Google Scholar
Sali A, Shakhnovich E, Karplus M (1994) How does a protein fold? Nature 369:248–251
Article Google Scholar
Kim PS, Baldwin RL (1990) Intermediates in the folding reactions of small proteins. Annu Rev Biochem 59:631–660
Article Google Scholar
Backofen R (2001) The protein structure prediction problem: a constraint optimization approach using a new lower bound. Springer 6:223–255
MathSciNet MATH Google Scholar
Richards FM (1977) Areas, volumes, packing, and protein structures. Annu Rev Biophys Bioeng 6:151–176
Article Google Scholar
Chikenji G, Kikuchi M, Iba Y (1999) Multi-self-overlap ensemble for protein folding: ground state search and thermodynamics. ARXIV 27:1–4
Google Scholar
Dill KA, Fiebig KM, Chan HS (1993) Cooperativity in protein-folding kinetics. Proc Natl Acad Sci USA 90:1942–1946
Article Google Scholar
Beutler T, Dill K (1996) A fast conformational search strategy for finding low energy structures of model proteins. Protein 5:2037–2043
Article Google Scholar
Yue K, Dill KA (1995) Forces of tertiary structural organization in globular proteins. Proc Natl Acad Sci USA 92:146–150
Article Google Scholar
Backofen R, Will S (2003) A constraint-based approach to structure prediction for simplified protein models that outperforms other existing methods. In: Proceedings of XIX international conference on logic programming, pp 49–71
Torrie GM, Valleau JP (1977) Nonphysical sampling distributions in MC free energy estimation: umbrella sampling. J Comput Phys 23:187–199
Article Google Scholar
Berg BA, Neuhaus T (1992) Multicanonical ensemble: a new approach to simulate first-order phase transitions. Phys Rev Lett 68:9–12
Article Google Scholar
Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 277:985–994
Article Google Scholar
Hoos HH, Stützle T (2004) Stochastic local search: foundations and applications. Elsevier, Amsterdam, pp 1–156
MATH Google Scholar
Krasnogor N, Pelta D, Lopez PM, Mocciola P, de la Canal E (1998) Genetic algorithms for the protein folding problem: a critical view. In: Proceedings of engineering of intelligent systems. ICSC Academic Press, pp 353–360
Patton AWP, Goldman E (1995) A standard GA approach to native protein conformation prediction. In: Proceedings of the 6th international conference in genetic algorithms Morgan Kaufmann Publishers, pp 574–581
Unger R, Moult J (1993) Genetic algorithms for protein folding simulations. J Mol Biol 231:75–81
Article Google Scholar
Unger R, Moult J (1993) A genetic algorithm for three dimensional protein folding simulations. In: Proceedings of the 5th international conference on genetic algorithms Morgan Kaufmann Publishers, pp 581–588
Hsu HP, Mehra V, Nadler W, Grassberger P (2003) Growth algorithm for lattice heteropolymers at low temperatures. J Chem Phys 51:118–444
Google Scholar
Bin W, Zhongzhi S (2011) An ant colony algorithm based partition algorithm for TSP. Chin J Comput 24:1328–1333
MathSciNet Google Scholar
Gambardella LM, Dorigo M (1999) Ant colonies for the quadratic assignment problem. J Oper Res Soc 50:167–176
Article MATH Google Scholar
Shmygelska A, Hernandez R, Hoos H H (2002): An ant colony optimization algorithm for the 2d hp protein folding problem. In: Proceedings of the 3rd international workshop on ant algorithms, pp 40–52
Shmygelska A, Hoos HH (2005) An ant colony optimization algorithm for the 2d and 3d hydrophobic polar protein folding problem. BMC Bioinform 30:97–112
Google Scholar
He LL, Shi F, Zhou HB (2011) Application of improved ant colony optimization algorithm to the 2D HP model. Wuhan Univ J (Nat Sci Edn) 51:33–38
MathSciNet Google Scholar
Xudong Wu (2012) A two-stage ant colony optimization algorithm for the vehicle routing problem with time windows. IJACT 4:485–491
Google Scholar
Liu Fang (2012) A dual population parallel ant colony optimization algorithm for solving the travelling salesman problem. JCIT 7:66–74
Google Scholar
Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA (2004) Development and testing of a general amber force field. J Comput Chem 25:1157–1174
Article Google Scholar
Singhal A, Ostermaier MK, Vishnivetskiy SA, Panneels V, Homan KT, Tesmer JJ, Veprintsev D, Deupi X, Gurevich VV, Schertler GF, Standfuss J (2013) Insights into congenital stationary night blindness based on the structure of G90D rhodopsin. EMBO Rep 14:520–526
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Biotechnology, Gautam Buddha University, Greater Noida, Uttar Pradesh, India
Aman Chandra Kaushik & Shakti Sahi

Authors

Aman Chandra Kaushik
View author publications
You can also search for this author in PubMed Google Scholar
Shakti Sahi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shakti Sahi.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 29 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaushik, A.C., Sahi, S. Biological complexity: ant colony meta-heuristic optimization algorithm for protein folding. Neural Comput & Applic 28, 3385–3391 (2017). https://doi.org/10.1007/s00521-016-2252-5

Download citation

Received: 19 June 2015
Accepted: 19 February 2016
Published: 07 March 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s00521-016-2252-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Biological complexity: ant colony meta-heuristic optimization algorithm for protein folding

Abstract

Similar content being viewed by others

Protein tertiary structure prediction by integrating ant colony optimization with path relinking and structure knowledge

New Ant Colony Optimization Including Weights

The 2D and 3D Protein Sequencing Implementation Using Ant Colony Optimization Algorithm