Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm

Li, Bai; Lin, Mu; Liu, Qiao; Li, Ya; Zhou, Changjun

doi:10.1007/s00894-015-2806-y

Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm

Original Paper
Published: 17 September 2015

Volume 21, article number 261, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Molecular Modeling Aims and scope Submit manuscript

Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm

Download PDF

Bai Li^1,2,
Mu Lin³,
Qiao Liu^2,4,
Ya Li⁵ &
…
Changjun Zhou⁶

1678 Accesses
22 Citations
Explore all metrics

Abstract

Protein folding is a fundamental topic in molecular biology. Conventional experimental techniques for protein structure identification or protein folding recognition require strict laboratory requirements and heavy operating burdens, which have largely limited their applications. Alternatively, computer-aided techniques have been developed to optimize protein structures or to predict the protein folding process. In this paper, we utilize a 3D off-lattice model to describe the original protein folding scheme as a simplified energy-optimal numerical problem, where all types of amino acid residues are binarized into hydrophobic and hydrophilic ones. We apply a balance-evolution artificial bee colony (BE-ABC) algorithm as the minimization solver, which is featured by the adaptive adjustment of search intensity to cater for the varying needs during the entire optimization process. In this work, we establish a benchmark case set with 13 real protein sequences from the Protein Data Bank database and evaluate the convergence performance of BE-ABC algorithm through strict comparisons with several state-of-the-art ABC variants in short-term numerical experiments. Besides that, our obtained best-so-far protein structures are compared to the ones in comprehensive previous literature. This study also provides preliminary insights into how artificial intelligence techniques can be applied to reveal the dynamics of protein folding.

The Application of Artificial Bee Colony Algorithm in Protein Structure Prediction

Improved Bees Algorithm for Protein Structure Prediction Using AB Off-Lattice Model

Differential Evolution Multi-Objective for Tertiary Protein Structure Prediction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Proteins are large biological molecules or macromolecules consisting of amino acid residues that are connected by peptide bonds [1]. Understanding the functional mechanism of proteins, known as the building blocks of life, is essential in biological sciences and presents an enormous impact on enzyme engineering pathology, medicine, and pharmaceutics [2]. The folded structures of proteins in a cell provide the information necessary for studying the functions of proteins at the molecular level [3, 4].

Various experimental techniques have been developed to identify protein structures [5]. Up to February 25, 2013, approximately 81,700 protein structures had been deposited in the Protein Data Bank (PDB) database; most of these structures were determined using X-ray diffraction (88.8 %) and nuclear magnetic resonance spectroscopy (10.5 %) [6]. However, strict laboratory requirements and heavy operating burdens have limited the applications of the experimental techniques, making it difficult to keep track of deposited structures in the same pace of primary structures (i.e., amino-acid sequences) recovery [2, 3, 7]. In 2011, the number of experimentally determined secondary/tertiary structures was twofold lower than the number of undetermined protein structures [8]. Hence, computation-based techniques have been used to optimize protein structures or to simulate protein folding processes.

Protein folding in computational biology involves calculating protein conformation by arranging a sequence of basic structural elements. Christian Anfinsen’s theory, which states that all information needed to predict the native structure of a protein is encoded in its primary sequence, has largely contributed to the development in this research area [9, 10]. Researchers [11, 12] have used this theory as a basis to analyze native structures with minimal free energy. However, solving such a problem remains challenging for complicated protein models [13]. A key issue here is to what extent the trivial mechanisms in protein folding should be omitted [14]. Therefore, models with reduced complexity (e.g., [15–18]) have been developed by utilizing parts of the protein properties to avoid the large computational cost associated with an all-atom model.

In the present study, an off-lattice model introduced by Stillinger et al. [18] is utilized to simplify the protein folding scheme. All the amino acid residues in a sequence can be classified as hydrophobic or hydrophilic on the basis of the K-D method [19]. These “binarized” residues are connected by unbendable but free-to-rotate chemical bonds. Considering potentials among particles, we utilize a predefined criterion to calculate the free energy of a given conformation. Assuming that the native conformation of a protein is the most stable structure with minimal free energy, we can simulate the protein folding process through minimizing the free energy criterion.

The original protein folding scheme can be transformed into a numerical optimization problem by using the off-lattice model. Considering the NP-hard property of such optimization problems [20], researchers have developed various optimization techniques. For example, Kim et al. [13] proposed a conformational space annealing optimizer, Zhang et al. [21] a genetic tabu search algorithm, Chen and Huang [22] a heuristic algorithm, Wang et al. [23] a chaotic artificial bee colony algorithm, and Liang an annealing contour Monte Carlo algorithm, among many others [24–44]. However, previous research studies have focused overly on developing optimization algorithms, leaving the original protein structure folding mission ignored. In fact, large numbers of new optimization methods are being proposed every day, which outperform the relatively old ones. Therefore, the impact of an optimizer is commonly not anticipated to increase with time in the long run. In this sense, we believe that research efforts should be exerted on the concerned biological problem so as to make long-lasting contributions.

Previous publications in this broad research area suffer from three typical drawbacks from the authors’ viewpoint. First is the lack of algorithm validation. Only the best-so-far protein structure solutions were reported without sufficient numerical experiments or theoretical analyses to support the proposed optimization algorithm. Second is that artificial protein sequences (e.g., Fibonacci sequences) or real sequences that are relatively short are considered in simulations. At this point, ref. [27] indicated that real sequence formations are far more realistic than those artificial ones, thus simulations on the Fibonacci sequences cannot fully verify the efficacy of the utilized optimization algorithm. Third is the lack of insights into the biological problem: the protein folding mission is regarded as a general numerical optimization problem and few in-depth analysis or discussions about the obtained results are attached. For the assistance of understanding, a list of the aforementioned references is shown in Fig. 1. In addition to the three common drawbacks mentioned above, the obtained solution vectors were seldom made public among the previous publications, rendering their calculated solutions impossible to validate.

The present study aims to overcome the typical limitations mentioned above. A balance-evolution artificial bee colony (BE-ABC) algorithm [31, 45] is applied for protein structure optimization. Compared with the conventional ABC algorithm and other state-of-the-art variants, BE-ABC algorithm utilizes parameters/variables in the algorithm framework as guidelines to adaptively adjust search intensity. Moreover, an overall degradation procedure is used to avoid premature convergence. A benchmark case set for protein structure optimization that focuses only on real protein sequences is systematically formulated. In addition to reporting the best-so-far optimization results, we also strictly compare the short-term convergence performance of BE-ABC algorithm to those of some state-of-the-art ABC variants. Also, innovative insights into the protein folding problem are presented. The details of our optimized best-so-far structures are released for the convenience of validation.

The remainder of this paper is organized as follows: the next section introduces the 3D off-lattice protein model, then the principle of BE-ABC algorithm is briefly presented. Results are presented next followed by further discussions and conclusions.

Three-dimensional off-lattice protein model

The off-lattice protein model was initially developed to consider 2D folding problems and was extended to deal with 3D scenarios where additional torsional energy contributions of each bond are taken into account [14]. In the present study, a 3D off-lattice model is used to fold proteins in a relatively flexible but nontrivial extent.

The remainder of this section concerns how a structural protein can be uniquely determined by a set of bond/torsional angle parameters and how the corresponding free-energy value can be calculated.

Structure formulation

The off-lattice model is a coarse-grained one which holds the viewpoint that the main driving forces to form protein structures are the hydrophobic interactions among amino acid residues [18]. Thus, all 20 types of amino acid residues are classified as hydrophobic residues (amino acids I, V, L, P, C, M, A, and G represented by letter “A”) and hydrophilic residues (amino acids D, E, F, H, K, N, Q, R, S, T, W, and Y represented by letter “B”) [19, 25, 27]. All the AB-binarized amino acid residues are sequentially connected by unit-length chemical bonds to form a structural chain in the 3D space. This subsection introduces how the unique structure of a chain with N residues can be specified by (2N − 5) angle parameters [θ ₁, …, θ _N − 2, β ₁, …, β _N − 3]_{1 × (2N − 5)}, which include (N − 2) bond angles and (N − 3) torsional angles.

Locations of amino acid residues in the space are determined as follows [23]:

$$ \left({x}_i,{y}_i,{z}_i\right) = \left\{\begin{array}{c}\hfill \left(0,0,0\right)\kern7.25em i=1\hfill \\ {}\hfill\ \left(0,1,0\right)\kern7.25em i=2\hfill \\ {}\hfill \left( \cos \left({\theta}_1\right), \sin \left({\theta}_1\right)+1,0\right)\kern1.5em i=3\hfill \\ {}\hfill \left(\begin{array}{l}{x}_{i-1}+ \cos \left({\theta}_{i-2}\right)\cdot \cos \left({\beta}_{i-2}\right),\\ {}{y}_{i-1}+ \sin \left({\theta}_{i-2}\right)\cdot \cos \left({\beta}_{i-2}\right),\\ {}{z}_{i-1}+ \sin \left({\beta}_{i-2}\right)\end{array}\right)\ 4\le i\le N\hfill \end{array}\right.. $$

(1)

For normalization, the first three amino acid particles are defined in the plane z = 0. The locations of subsequent particles (i.e., when i ≥ 4) can then be calculated recursively. Specifically, the location of the ith particle is based on that of the (i − 1)th particle (i ≥ 4). Figure 2 shows how the location of a protein sequence AABA is determined by the angle vector [θ ₁, θ ₂, β ₁].

Free energy computation

Once a unique structure can be specified by a set of angular parameters, the corresponding free energy of the protein structure can be calculated.

In the 3D off-lattice model, the free energy function Energy can be defined as:

$$ Energy\left(\left[{\theta}_1,\dots, {\theta}_{N-2},{\beta}_1,\dots, {\beta}_{N-3}\right]\right)={\displaystyle \sum_{i=1}^{N-2}\frac{1- \cos \left({\theta}_i\right)}{4}}+4{\displaystyle \sum_{i=1}^{N-2}{\displaystyle \sum_{j=i+2}^N\left[{r_{ij}}^{-12}-C\left({\xi}_i,{\xi}_j\right){r_{ij}}^{-6}\right]}}, $$

(2)

where ξi reflects the binary property of the ith particle in the sequence, r _ij denotes the distance between residues i and j in the 3D space, and C(ξi, ξj) represents the interaction between these two particles. If particle i is hydrophilic, then ξ _i = 1; otherwise, ξ _i = − 1. r _ij and C(ξi, ξj) can be defined as follows:

$$ {r}_{ij}=\sqrt{{\left({x}_i-{x}_j\right)}^2+{\left({y}_i-{y}_j\right)}^2+{\left({z}_i-{z}_j\right)}^2}. $$

(3)

$$ C\left({\xi}_i,{\xi}_j\right)=\frac{1}{8}\left(1+{\xi}_i+{\xi}_j+5{\xi}_i{\xi}_j\right). $$

(4)

The free energy in Eq. (2) consists of two items that respectively represent intermolecular interactions among residues and intermolecular interactions between the peptide chain and surrounding solvent molecules. The first sum in Eq. (2) runs over the (N − 2) angles θ _i ∈ (−180^∘, 180^∘] of successive bond vectors. This item represents the bending energy and the coupling is ferromagnetic, i.e., it penalizes energy to bend the chain [31]. The second item partially competes with the bending barrier depending on the distance between nonadjacent particles along the chain [14].

One angle vector [θ ₁, …, θ _N − 2, β ₁, …, β _N − 3] represents one candidate structure and Energy([θ ₁, …, θ _N − 2, β ₁, …, β _N − 3]) is the corresponding free energy value, with θ _i, β _j ∈ (−180^∘, 180^∘] for any 1 ≤ i ≤ N − 2, 1 ≤ j ≤ N − 3.

In this way, an protein folding mission is transformed into a constrained numerical optimization problem. Given a protein sequence, we aim to find the optimal angle vector associated with the minimal free energy value, i.e.,

$$ {\left[{\theta}_1,\dots, {\theta}_{N-2},{\beta}_1,\dots, {\beta}_{N-3}\right]}^{\ast }= \arg \left(\underset{-{180}^{\circ }<{\theta}_i,{\beta}_j<{180}^{\circ }}{ \min}\left( Energy\left(\left[{\theta}_1,\dots, {\theta}_{N-2},{\beta}_1,\dots, {\beta}_{N-3}\right]\right)\right)\right),1\le i\le N-2,\ 1\le j\le N-3. $$

The following section introduces BE-ABC algorithm to search for that optimal solution.

BE-ABC algorithm

Consistent with Anfinsen’s theories of protein structure stability when free energies of native protein structures are at their lowest, this section focuses on how BE-ABC algorithm searches for the solution with minimum free energy.

Motivations

The conventional ABC algorithm is a well-known swarm intelligence optimization method inspired by the foraging behavior of bees [46]. ABC algorithm is featured by the implementation of both local exploitation and global exploration procedures during the optimization process [47]. However, local search accuracy of the ABC algorithm is not satisfactory [48], causing large numbers of remedies proposed for that imperfection. Most of the proposed ABC variants have considered adjusting the local exploitation equations to achieve good search intensity. To that end, three ways have been prevailingly utilized: (i) to adopt a new principle from the outside world (e.g., [23, 49, 50]), (ii) to integrate ABC with other metaheuristics and (iii) to adjust the local search intensity in a self-adaptive manner (e.g., [7, 45, 51]). Commonly it is difficult to intuitively judge whether one ABC variant outperforms another unless statistically significant results based on comparative numerical experiments are available. Generally speaking, the absence of in-depth mathematical analyses may have made this research area as prosperous as it is. In contrast with the first two ways of modification, self-adaptive ABC variants (i.e., methods based on the third way) are intuitively understandable (because no complicated outside-world principles are involved) and remain the original framework of the conventional ABC algorithm. On the other hand, one may notice that the effect to execute a local/global search may be different during different periods of the entire optimization process, making it sensible to strike an intensity balance between a global search and a local one.

Adjusting the search intensity adaptively is considered in BE-ABC algorithm. Specifically, when evolutions are made with ease around one employed bee, the subsequent search accuracy is intensified to expect further evolutions there. On the contrary, when the search efficiency is bad there, the subsequent search should be executed in a relatively wide scale. Through this, it is possible that a global search appears local and a local exploitation appears global. Then there is no longer any clear gap between the global and local searches during the evolutions. Since a balance between the global and local searches is ideally expected, we name this proposal “balance evolution”.

In more detail, BE-ABC algorithm is different from the conventional ABC algorithm in two aspects. First, adaptively changed multipliers are added in the global and global search equations to manipulate the search intensity. Those multipliers are determined on the basis of the optimization efficiency in a current iteration. Second, the conventional re-initialization procedure is improved by an overall degradation strategy, which aims to re-initialize employed bees simultaneously during one iteration (instead of re-initializing only one in each iteration). In BE-ABC algorithm, the optimization efficiency is evaluated through the unsuccessful trail attempts. In the next subsection, the principle of BE-ABC algorithm is presented.

Algorithm principle

BE-ABC algorithm involves three types of bees: unemployed, employed, and onlooker. Unemployed bees search in a fully random manner in space while employed bees form a cooperative group to search in a global manner. Onlooker bees follow “qualified” employed bees to exploit locally. A qualified employed bee is one that finds a high-quality nectar source [31]. The location of a nectar source resembles a candidate solution to the concerned free energy minimization problem and the nectar quantity is evaluated by the function Energy(⋅). Considering that the search space (wherein all nectar sources exist) is not infinite, we impose bounds on the solution vectors: let X = [θ ₁, …, θ _N − 2, β ₁, …, β _N − 3]_{1 × (2N − 5)} represent a general candidate solution, then Lb ≤ X ≤ Ub, where Ub and Lb denote the upper/lower bound. In this current study, we have Ub = (180^∘, …, 180^∘)_{1 × (2N − 5)} and Lb = (−180^∘, …, − 180^∘)_{1 × (2N − 5)}. To simplify the presentation, in the remaining of this paper, we refer to X as (X ¹, X ², ⋯, X ^Dim), Ub as (U ¹, U ², ⋯, U ^Dim) and Lb as (L ¹, L ², ⋯, L ^Dim) when Dim = 2N − 5.

A bee colony consists of SN bees, one half of which are unemployed bees and the other half onlooker bees. Before the optimization starts, those $ \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right. $ unemployed bees are randomly initialized in the search space. The following equation shows how the jth element of the ith employed bee’s location X_i is generated:

$$ \begin{array}{l}{X}_i^j\leftarrow {L}^j+ rand\left(0,1\right)\cdot \left({U}^j-{L}^j\right),\kern0.5em \\ {}\kern6.25em i=1,2,\dots, \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.,\ j=1,2,\dots, Dim,\end{array} $$

(5)

where rand(0, 1) denotes a random number (range of 0 to 1) obeying uniform distribution. After the initialization procedure, these unemployed bees directly become employed bees. Then an iterative optimization process gets started.

In each cycle of iteration, $ \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right. $ employed bees try new positions within the search space. The new positions are generated by sharing locations with each other. For instance, the ith employed bee X_i is assumed at (X ¹_i , X ²_i , ⋯, X ^Dim_i ) and the to-be-computed search location X ^∗_i is denoted by (X ^∗ 1_i , X ^∗ 2_i , ⋯, X ^∗ Dim_i ). First, as many as trial(i) out of all Dim dimensions in X_i are randomly selected. trial(i) is an integral scalar (range of 1 to Dim) associated with the ith employed bee, which is formally introduced below. Second, the element of each selected dimension is mutated through a crossover procedure. The following equation shows how the ith employed bee takes usage of the randomly chosen kth employed bee’s location to update the jth dimension:

$$ \begin{array}{l}{X^{\ast}}_i^j\leftarrow {X}_i^j+ rand\left(-1,1\right)\cdot \left({X}_k^j-{X}_i^j\right)\cdot \frac{trial(i)}{trial(i)+ trial(k)},\ \\ {}\kern16em k\in \left\{1,2,\dots, \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.\right\},\ k\ne i.\end{array} $$

(6)

When X ^∗_i is available, Energy(X ^∗_i ) is immediately compared with Energy(X _i). When Energy(X ^∗_i ) < Energy(X _i), the ith employed bee flies to the new location X ^∗, i.e., X_i is updated as X ^∗_i . In addition, trial(i) should be reset to 1. When Energy(X ^∗_i ) ≥ Energy(X _i), the ith employed bee remains at X_i but trial(i) adds one. Based on the principle by which the principle trial(i) is calculated, it is notable that trial(i) records the consecutive unsuccessful trial attempts of the ith employed bee.

When all the employed bees have updated their locations, onlooker bees act accordingly. A roulette selection procedure guides those $ \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right. $ onlooker bees to choose qualified employed bees to search around. A probability index P is calculated according to Eqs. (7) and (8) to reflect the search quality of the employed bees [46].

$$ P(i)=\frac{fitness(i)}{{\displaystyle {\sum}_{j=1}^{\raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} fitness(j)}},\ i=1,2,\dots, \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right., $$

(7)

$$ fitness(i)=\left\{\begin{array}{ll}\frac{1}{1+ Energy\left({\mathbf{X}}_i\right)}\hfill & \kern2.75em \mathrm{if}\ Energy\left({\mathbf{X}}_i\right)\ \ge 0\hfill \\ {}1- Energy\left({\mathbf{X}}_i\right)\hfill & \kern2.75em \mathrm{if}\ Energy\left({\mathbf{X}}_i\right) < 0\hfill \end{array}.\right. $$

(8)

Let’s take the ith onlooker bee for example to see how its search location can be generated. In order to find an employed bee to follow around, a comparison is made between rand(0, 1) and P(1) first. When P(1) ≥ rand(0, 1), the ith onlooker bee will search around the first employed bee; otherwise, an additional comparison between rand(0, 1) (another random number this time) and P(2) is performed. If all the P(j) $ \left(j=1,2,\dots, \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.\right) $ happen to be smaller than rand(0, 1), the process is repeated from the beginning P(1) again until one P(j) that satisfies P(j) ≥ rand(0, 1) is finally found. That finally found jth employed bee is chosen by the ith onlooker bee to search around.

Contrary to the employed-bee phase, the crossover and mutation process involves only one (randomly chosen) dimension of the onlooker bee’s solution vector. The following equation shows how the ith onlooker bee utilizes the mth employed bee to generate a new search location around the jth in the kth dimension:

$$ \begin{array}{l}{Y}_j^k\leftarrow {X}_j^k+ rand\left(-1,1\right)\cdot \left({X}_m^k-{X}_j^k\right)\cdot \frac{trial(j)}{trial(j)+ trial(m)},\ \\ {}\kern12.25em m\in \left\{1,2,\dots, \raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.\right\},\ m\ne j.\end{array} $$

(9)

When the location of the onlooker bee Y _i = (X ¹_j , …, X ^k − 1_j , Y ^k_j , X ^k + 1_j , …, X ^Dim_j ) is determined, Energy(X _j) and Energy(Y _i) are compared. When Energy(Y _i) < Energy(X _j), the jth employed bee updates its current location X _j as Y _i. Also, trial(j) is reset to 1. When Energy(X _j) < Energy(Y _i), trial(j) adds one. When all the onlooker bees have tried around their chosen employed bees, a re-initialization phase at the end of each iteration is followed.

Two things should be accomplished in the re-initialization phase. First, any trial(i) that exceeds Dim is set to Dim. Second, a criterion (i.e., $ \frac{2}{SN}{\displaystyle {\sum}_{i=1}^{\raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} trial(i)} $) is proposed to reflect the overall optimization efficiency and is compared with α _odr ⋅ Dim, where α _odr ∈ (0, 1) is a user-specified threshold. When $ {\alpha}_{odr}\cdot Dim<\frac{2}{SN}{\displaystyle {\sum}_{i=1}^{\raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} trial(i)} $, the entire employed bee swarm is considered not working efficiently to a degree of α _odr. Then, (100α _odr) % randomly chosen employed bees directly become unemployed bees, which will turn to employed bees at the beginning of the next iteration. The corresponding trial indices are reset to 1. When $ {\alpha}_{odr}\cdot Dim\ge \frac{2}{SN}{\displaystyle {\sum}_{i=1}^{\raisebox{1ex}{$SN$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} trial(i)} $, the current iteration cycle is directly terminated.

When the iteration number reaches a user-specified threshold MCN, the entire optimization process is accomplished. A flowchart of BE-ABC algorithm is depicted in Fig. 3.

Results

We systematically design a series of simulations on real protein sequences (Table 1) that originated from the PDB database. All simulations involved in this section were carried out in a Matlab R2011b environment and executed on an Intel Core 2 Duo CPU with 2 GB RAM running at 2.53 GHz under Windows XP. In this work, α _odr is set to 0.9.

Table 1 Details of 13 benchmark amino acid sequences

Full size table

In this section, first, the convergence performance of BE-ABC algorithm is evaluated by comparing it to several ABC variants in short-term numerical experiments. Second, our best-so-far solutions optimized by BE-ABC algorithm are compared to the ones published in previous works. Third, the dynamic process of protein folding is preliminarily investigated through case studies.

Comparison of optimization performance among ABC variants

The convergence performance of BE-ABC algorithm is evaluated on all 13 real sequences. In order to show its advantage, BE-ABC algorithm was compared to the conventional ABC algorithm and its state-of-the-art variants, namely an improved ABC (I-ABC) [51], Gbest-guided ABC (Gbest-ABC) [48], chaotic ABC (C-ABC) [23], and internal feedback ABC (IF-ABC) [7]. In each of the 13 cases, the mentioned algorithms were tested repeatedly for 30 times under the condition that SN = 40. MCN was set to 5000 for relatively short sequences and 20,000 for sequences with more than 85 amino acids. Figures 4, 5, and 6 depict the evolutionary curves in three representative cases (which represent short, medium and long sequences respectively). Results of all these cases are collected in Table 2.

Table 2 Comparative optimization results (average, standard deviations, and time consumptions) obtained using BE-ABC and other ABC variants

Full size table

Comparison of structures optimized by BE-ABC algorithm with previous literature

This subsection compares the best-so-far structures optimized by BE-ABC algorithm with the structures reported in previous publications. Table 3 lists the lowest free energy values for the concerned 13 sequences obtained using different optimization techniques. Figure 7 illustrates the structures obtained using our BE-ABC algorithm. Details of the optimized structures are provided in Table 4.

Table 3 Comparative results of the lowest free energy values for 13 sequences achieved by an improved particle swarm optimization (I-PSO) algorithm [25], a hybrid optimization that combines PSO, genetic algorithm (GA), and Tabu search (TS) named PGATS [35], a multiple-populations GA-PSO (MPGPSO) algorithm [37], ABC algorithm [36], GA-based TS (GATS) algorithm [21, 27], and our BE-ABC algorithm.

Full size table

Folding process investigation

During the optimization process wherein BE-ABC searches for the lowest free energy, lower and lower values are sequentially obtained. Therefore, how the protein structure “migrates” during the optimization process can be observed, which may reveal critical insights into the protein folding process that occurs in nature.

Simulations were conducted on sequences 1BXL and 1AGT as examples. The results are depicted in Figs. 8 and 9.

Discussion

In this section, the simulation results shown in the preceding section are further analyzed.

Performance of BE-ABC algorithm

Through a comprehensive set of short-term numerical experiments, the optimization performance of BE-ABC algorithm is compared to other state-of-the-art ABC variants. The collected data in Table 2 indicate that BE-ABC algorithm outperforms the other ABC variants.

As depicted in Figures 4, 5, and 6, the evolution processes of BE-ABC algorithm do not remain the best among all six ABC variants in some early iterations. The origin of the problem might come from the following aspects of the algorithm. First, intuitively speaking, global search should be enhanced in the early iterations when evolutions are easy to make. During that period, elements in the trial vector are relatively small, yielding the global search scale of BE-ABC algorithm generally smaller than that of the conventional ABC algorithm under the same circumstance. In a simple instance with trial(i) = trial(k) = 1 regarding Eq. (6), the search scale determined by BE-ABC algorithm shrinks to half of that determined by the conventional ABC algorithm. When trial(k) is larger, the shrink becomes severer, which is not good for a global search. Thus the evolution processes based on BE-ABC algorithm commonly lag at the very beginning. Due to the same reason, when making an evolution is not that easy, the benefit of utilizing gradually emerges and that is why BE-ABC algorithm usually catches up with its competitors in the later iterations. The second aspect concerns the overall degradation strategy. Unlike that in the conventional ABC algorithm, BE-ABC algorithm requires that part of the employed bees re-initialized all at once, which happens under some certain circumstance, which is not likely to happen in the early iterations. In contrast, ABC algorithm re-initializes the employed bees one after another when they are judged as inefficient. In other words, the re-initialization procedure in BE-ABC algorithm seldom works during the early iterations. This may cause differences among the evolution curves. However, re-initializing the employed bees one after another (in the conventional ABC algorithm) is not efficacious when evolutions are difficult to make or when SN is set large. The long-lasting convergence performance (based on efficient re-initialization ability) of BE-ABC algorithm is clearly illustrated, especially in Fig. 6.

Moreover, Table 2 shows that BE-ABC algorithm usually consumes less time than IF-ABC and C-ABC algorithms do. This is because BE-ABC algorithm neither adds complicated outside-world strategy nor changes the conventional algorithm framework.

Performance of the optimized best-so-far structures

The best solutions obtained through BE-ABC are compared with those presented in previous literature. The blanks in Table 3 indicate that more cases than those in previous studies have been considered in this present study. In fact, the comparative studies listed in Table 3 constitute all the relevant studies that can be found, to the best of our ability. In nearly all of the cases, BE-ABC algorithm clearly outperforms its competitors.

Here, curious readers may ask how the best solutions were computed. It is a “mystery” how the best solutions in most of the previous publications were computed. In fact, there is not a unified standard that all the publications had followed to generate the best solutions. In our work, we terminated the optimization process when there had been no evolution for up to 5,000,000 consecutive iterations.

Although in some cases our obtained free-energy values are slightly lower than previously published ones, the optimized structures are completely different. The best-so-far structures reported in refs. [21] and [27] for cases 1CB3, 1BXL, and 1EDP are depicted in Fig. 10. Also in that figure, an index ΔE is utilized to reflect the error between their reported free energy and ours; another root of mean square error index RMSE (defined in Eq. (10)) measures the difference between two concerned solution vectors.

$$ RMSE\left(\mathbf{X},\mathbf{Y}\right)=\sqrt{\frac{1}{Dim}\left({\displaystyle {\sum}_{i=1}^{Dim}{\left({X}_i-{Y}_i\right)}^2}\right)}, $$

(10)

where X = [X₁, X ₂, …, X_Dim] and Y = [Y₁, Y ₂, …, Y _Dim].

Commonly in a natural protein structure, the hydrophobic residues tend to form a core with a minimum surface area that encounters water molecules. Thus such a hydrophobic core tends to be surrounded by hydrophilic residues so as to prevent encountering water molecules outside. In Fig. 7 (where the light-colored particles reflect hydrophobic residues while those dark-colored ones reflect hydrophilic residues), that characteristic appearing in nature can be reflected in nearly all of the simulated cases.

Performance of off-lattice model

In this research study, we have also investigated the protein folding process which is expressed on the basis of the off-lattice model. Assuming that the structure optimization process is the natural process for a protein structure to be folded, we observe that all the hydrophobic residues gradually assemble to become one central core surrounded by hydrophilic residues in cases 1BXL and 1AGT.

Although the off-lattice model roughly classifies all the amino acids in a sequence into two groups, some critical characteristics during the natural folding process can be observed in our artificial optimization process, possibly providing an initial guess for protein folding simulation with more complicated models.

Conclusions

In this paper, we have investigated protein folding optimization on the basis of a coarse-grained mode (i.e., the 3D off-lattice model) and an improved ABC algorithm. The underlying contributions of this paper are summarized as follows.

First, this work presents an application of a relatively new metaheuristic optimization technique. The optimization performance of BE-ABC algorithm is comprehensively evaluated by comparing it with several state-of-the-art ABC variants.

Second, 13 real protein sequences, rather than artificial sequences, are chosen for optimization. To the best of our knowledge, none of the previous works have conducted simulations in such a comprehensive manner. The simulated protein sequences and obtained optimization results form a benchmark case set, which can be used as a standard set by future researchers.

Last, the protein folding process is investigated on the basis of the off-lattice model through a viewpoint of evolution. Previous studies used their concerned optimization techniques to solve a numerical problem but quite few of those studies further investigated the efficiency of the off-lattice model. Simulation results in this study indicate that the off-lattice model provides a rough estimation of the folding process of proteins.

Investigations about how the optimized results benefit the area of molecular biology should be made through future studies. Effects are also needed to measure how close the optimized structures are to the existing ones in the PDB database and to reveal more details about the protein folding process.

References

Dehzangi A, Lyons J, Sharma A, Sattar A (2014) A segmentation-based method to extract structural and evolutionary features for protein fold recognition. IEEE/ACM Trans Comput Biol Bioinf 11(3):510–519
Sahu SS, Panda G (2010) A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34(5–6):320–327
Article CAS Google Scholar
Hendy H, Khalifa W, Roushdy M, Salem AB (2015) A study of intelligent techniques for protein secondary structure prediction. Int J Inf Mod Anal 4(1):3–12
Google Scholar
Xu Y, Xu D, Liang J (2007) Computational methods for protein structure prediction and modeling. Vol 1: Basic characterization. Biol Med Phys Biomed Eng. Springer, New York
Comellas G, Rienstra CM (2013) Protein structure determination by magic-angle spinning solid-state NMR, and insights into the formation, structure, and stability of amyloid fibrils. Ann Rev Biophys 42:515–536
Article CAS Google Scholar
Bagaria A, Jaravine V, Güntert P (2013) Estimating structure quality trends in the protein data bank by equivalent resolution. Comput Biol Chem 46:8–15
Article CAS Google Scholar
Li B, Li Y, Gong LG (2014) Protein secondary structure optimization using an improved artificial bee colony algorithm based on off-lattice model. Eng Appl Artif Intell 27:70–79
Article Google Scholar
Poole RK (2011) Globins and other nitric oxide-reactive proteins. Academic, San Diego
Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230
Article CAS Google Scholar
Wooley JC and Ye Y (2007) A historical perspective and overview of protein structure prediction. Comput Meth Pro Struct Predic Model. Springer, New York, pp 1–43
Gibson KD, Scheraga HA (1967) Minimization of polypeptide energy. I. Preliminary structures of bovine pancreatic ribonuclease Speptide. Proc Natl Acad Sci U S A 58(2):420–427
Article CAS Google Scholar
Scott RA, Vanderkooi G, Tuttle RW, Shames PM, Scheraga HA (1967) Minimization of polypeptide energy, iii. Application of a rapid energy minimization technique to the calculation of preliminary structures of gramicidins. Proc Natl Acad Sci U S A 58(6):2204–2211
Article CAS Google Scholar
Kim SY, Lee SB and Lee J (2005) Structure optimization by conformational space annealing in an off-lattice protein model. Phys Rev E 72:011916
Bachmann M, Arkın H, and Janke W (2005) Multicanonical study of coarse-grained off-lattice models for folding heteropolymers. Phys Rev E 71:031906
Bryant SH, Lawrence CE (1993) An empirical energy function for threading protein sequence through the folding motif. Pro: Struct Func Bioinf 16(1):92–112
CAS Google Scholar
Levitt M, Warshel A (1975) Computer simulation of protein folding. Nature 253(5494):694–698
Article CAS Google Scholar
Dill KA (1985) Theory for the folding and stability of globular proteins. Biochemistry 24(6):1501–1509
Article CAS Google Scholar
Stillinger FH, Head-Gordon T, Hirshfeld CL (1993) Toy model for protein folding. Phys Rev E 48(2):1469–1477
Article CAS Google Scholar
Mount DW, Mount DW (2001). Bioinformatics: sequence and genome analysis (vol 2). Cold Spring Harbor Laboratory Press, New York
Unger R, Moult J (1993) Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications. B Math Biol 55(6):1183–1198
Article CAS Google Scholar
Zhang X, Wang T, Luo H, Yang JY, Deng Y, Tang J, Yang MQ (2010) 3D Protein structure prediction with genetic tabu search algorithm. BMC Syst Biol 4(1):S6
Chen M, Huang WQ (2006) Heuristic algorithm for off-lattice protein folding problem. J Zhejiang Uni SCI B 7(1):7–12
Article CAS Google Scholar
Wang Y, Guo GD, Chen LF (2013) Chaotic artificial bee colony algorithm: a new approach to the problem of minimization of energy of the 3D protein structure. Mol Biol 47(6):894–900
Article CAS Google Scholar
Liang F (2004) Annealing contour monte Carlo algorithm for structure optimization in an off-lattice protein model. J Chem Phys 120(14):6756–6763
Article CAS Google Scholar
Chen X, Lv M, Zhao L and Zhang X (2011) An improved particle swarm optimization for protein folding prediction. Int J Inform Eng Electron Bus 3:1
Kim J, Straub JE and Keyes T (2007) Structure optimization and folding mechanisms of off-lattice protein models using statistical temperature molecular dynamics simulation: statistical temperature annealing. Phys Rev E 76: e011913
Wang T, Zhang X (2011) A case study of 3D protein structure prediction with genetic algorithm and Tabu search. Wuhan Uni J Nat Sci 16(2):125–129
Article Google Scholar
Kalegari DH, Lopes HS (2010) A differential evolution approach for protein structure optimisation using a 2D off-lattice model. Int J Bio-Inspired Comput 2(3):242–250
Article Google Scholar
Hsu HP, Mehra V, Grassberger P (2003) Structure optimization in an off-lattice protein model. Phys Rev 68(3):e037703
Irbäck A, Peterson C, Potthast F, Sommelius O (1997) Local interactions and protein folding: a three-dimensional off-lattice approach. J Chem Phys 107(1):273–282
Article Google Scholar
Li B, Chiong R, Lin M (2015) A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional off-lattice model”. Comput Biol Chem 54:1–12
Article Google Scholar
Li B, Gong L, andYao Y (2013) On the performance of internal feedback artificial bee colony algorithm (IF-ABC) for protein secondary structure prediction. In Proc 6th Int Conf Adv Comput Intell, Hangzhou, China, pp 33–38
Zhang X and Cheng W (2008) Protein 3D structure prediction by improved tabu search in off-lattice AB model. In Proc 2nd Int Conf Bioinf Biomed Eng, Shanghai, pp 184–187
Jana ND, Sil J (2013) Hybrid particle swarm optimization technique for protein structure prediction using 2D off-lattice model. Swarm Evolut Meme Comput 8298:193–204
Article Google Scholar
Zhou C, Hou C, Wei X, Zhang Q (2014) Improved hybrid optimization algorithm for 3D protein structure prediction. J Mol Model 20(7):1–12
Article CAS Google Scholar
Li Y, Zhou C, and Zheng X (2014) “The application of artificial bee colony algorithm in protein structure prediction. Bio-Inspired Comput Theories Appl 472:255–258
Zhou C, Hu T, Zhou S (2014) Protein structure prediction based on improved multiple populations and GA-PSO. Bio-Inspired Comput Theories Appl 472:644–647
Google Scholar
Benitez CM, Parpinelli RS and Lopes HS (2013) A heterogeneous parallel ecologically-inspired approach applied to the 3D-off-lattice protein structure prediction problem. In Proc 11th Conf Comput Intell, pp 592–597
Lin X and Zhang X (2014) Protein folding structure optimization based on GAPSO algorithm in the off-lattice model. In Proc IEEE Conf Bioinf Biomed, pp 43–49
Liu J, Sun Y, Li G, Song B, Huang W (2013) Heuristic-based tabu search algorithm for folding two-dimensional off-lattice model proteins. Comput Biol Chem 47:142–148
Jana ND, Sil J and Das S (2015) Improved Bees Algorithm for Protein Structure Prediction Using AB Off-Lattice Model. Mendel 2015. Springer, Heidelberg, pp 39–52
Fan J, Duan H, Xie H and Shi H (2014) Improved Biogeography-Based Optimization approach to secondary protein prediction. In 2014 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 4223–4228
Doğan B, Ölmez T (2015) Modified Off-lattice AB model for protein folding problem using the vortex search algorithm. Int J Mach Learn Comput 5(4):329–333
Article Google Scholar
Kalegari DH and Lopes DH (2013) An improved parallel differential evolution approach for protein structure prediction using both 2D and 3D off-lattice models. In 2013 I.E. Symposium on Differential Evolution (SDE), IEEE, pp 143–150
Li B, Gong LG and Yang W (2014) An improved artificial bee colony algorithm based on balance-evolution strategy for unmanned combat aerial vehicle path planning. Sci World J doi:10.1155/2014/232704
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Glob Optim 39(3):459–471
Article Google Scholar
Li B, Chiong R, and Gong L G (2014) Search-evasion path planning for submarines using the artificial bee colony algorithm. In 2014 I.E. Congress on Evolutionary Computation, IEEE, pp 528–535
Zhu G, Kwong S (2010) Gbest-guided artificial bee colony algorithm for numerical function optimization. Appl Math Comput 217(7):3166–3173
Article Google Scholar
Gao W, Liu S, Huang L (2013) A novel artificial bee colony algorithm with Powell's method. Appl Soft Comput 13(9):3763–3775
Article Google Scholar
Kang F, Li J, Ma Z (2011) Rosenbrock artificial bee colony algorithm for accurate global optimization of numerical functions. Inf Sci 181(16):3508–3531
Article Google Scholar
Li G, Niu P, Xiao X (2012) Development and investigation of efficient artificial bee colony algorithm for numerical function optimization. Appl Soft Comput 12(1):320–332
Article Google Scholar

Download references

Acknowledgments

The authors are sincerely grateful to the anonymous reviewers, Mr. Xuebin Ji, Mr. Ligang Gong, Dr. Haibin Duan, Dr. Chengdong Pu, Prof. I-Fang Chung, Prof. Raymond Chiong, and Prof. Jaap Heringa for their comments and suggestions. This work is sponsored by the Fundamental Research Funds for the Central Universities under Grant YWF-14-SXXY-006 and Grant YWF-15-SXXY-004, the 6^th National College Students’ Innovation & Entrepreneurial Training Program under Grant 201210006050 as well as the National Natural Science Foundation of China under Grant 61370005.

Author information

Authors and Affiliations

College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China
Bai Li
School of Advanced Engineering, Beihang University, Beijing, 100191, China
Bai Li & Qiao Liu
College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, 310027, China
Mu Lin
School of Instrument Science and Opto-electronic Engineering, Beihang University, Beijing, 100191, China
Qiao Liu
School of Mathematics and Systems Science & LMIB, Beihang University, Beijing, 100191, China
Ya Li
Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian, 116622, China
Changjun Zhou

Authors

Bai Li
View author publications
You can also search for this author in PubMed Google Scholar
Mu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Qiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ya Li
View author publications
You can also search for this author in PubMed Google Scholar
Changjun Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bai Li.

Appendix

Table 4 Best-so-far solution optimized by BE-ABC algorithm

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, B., Lin, M., Liu, Q. et al. Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm. J Mol Model 21, 261 (2015). https://doi.org/10.1007/s00894-015-2806-y

Download citation

Received: 18 June 2015
Accepted: 30 August 2015
Published: 17 September 2015
DOI: https://doi.org/10.1007/s00894-015-2806-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm

Abstract

Similar content being viewed by others

The Application of Artificial Bee Colony Algorithm in Protein Structure Prediction

Improved Bees Algorithm for Protein Structure Prediction Using AB Off-Lattice Model

Differential Evolution Multi-Objective for Tertiary Protein Structure Prediction

Introduction