Keywords

1 Introduction

The study of protein structure helps us understand the function of proteins and to understand how they exercise their biological function, and understanding protein-protein interaction (or other molecules) is very important for biology, medicine and pharmacy [1]. In recent years, although the test methods for the determination of protein structures develop well, it’s still time-consuming, expensive, and not applied for some proteins difficult to crystallize. Therefore, we need to develop the theoretical analysis. With the development of computer technology, it has gradually become an important tool for processing large data of protein molecule.

Theoretical Prediction of protein three-dimensional structure is mainly divided into the following three steps: Firstly, the mathematical model proposed should reflect the interaction among the amino acid residues; Secondly, we have to establish a simple calculated energy function which can also distinguish correctly between natural proteins and other proteins based on the thermodynamic hypothesis, Thirdly, we have to find the appropriate global optimization methods on the corresponding model to find the minimum of energy function. Owing to the efforts of researchers, we have had lots of achievements about the above steps. There are two widely used models named hydrophobic-polar (HP) model [2] and AB off-lattice model [3]. The HP-lattice model uses two types of residues to present the amino acid chains which are hydrophobic (H) or non-polar residues and polar (P) or hydrophilic residues, and the residues on the vertices of stack cubic lattices are linked sequentially by unit-length chemical bonds [46]. This model is simple but it neglects local interactions which are important in protein folding [7]. The AB off-lattice model is more accurate than HP-lattice model, because the bond angles are free-to-rotate and it takes torsional energy of each pair of bonds into account [8]. Though this model neglects the effect of side chains and provides only a coarse-grained approximation to the real proteins, its off-lattice construction still reflects some basic features of real proteins [9, 10]. Due to the complexity of protein folding, it’s difficult to establish an accurate protein energy function. So many simplified energy functions are proposed as in [1, 11, 12], etc. And in this paper we use one of the most widely used energy function which is the same as that of [11] to predict the protein 3D structure. As for the global optimization methods, many algorithms have been proposed. Today, many algorithms have been proposed to predict protein structure using HP-lattice model, such as Multi-Self-Overlap Ensemble (MSOE), Ant Colony Optimization (ACO) [13], Multi-crossover and mutation Partial Swarm Optimization-Tabu Search algorithm (MCMPSO-TS) [14], etc. The MOSE algorithm uses a heuristic bias function to help the theoretical protein structure form a hydrophobic core, but it is only efficient for the proteins of simple folding [13]. The ACO algorithm’s inspiration is from the observation of ants’ behavior of searching for food, and the algorithm structure can be divided into three parts: construct ants solutions, update pheromone, and daemon actions [13]. The MCMPSO-TS algorithm is a hybrid search algorithm which combines the particle swarm optimizer (PSO) algorithm and tabu search (TS) algorithm to get better global optimization ability [14]. The used algorithms to predict protein structure with the AB off-lattice model are as follows, PSO, Tabu Search-Particle Swarm Optimization (TPSO) [15], Levy Flight Particle Swarm Optimizer (LPSO) [16], Genetic-Particle Swarm Optimization-Tabu Search algorithm (PGATS) [17], Artificial Bee Colony algorithm (ABC) [18], etc. The PSO algorithm is put forward on the basis of information transmission process of migrating birds that every bird can not only remember the best place to find the distance of the food but also know the optimal location found by the population of all the birds [15]. The TPSO algorithm combines PSO and TS to take advantage of each algorithm to improve the search precision and the ability to jump out local optimal [15]. The LPSO algorithm adds a random process named levy flight to PSO algorithm to improve algorithm’s precision [16]. The PGATS algorithm is proposed on the basis of GA-PSO algorithm which is a hybrid algorithm using the strategy of updating parameter to guarantee the diversity of population in the late run of algorithm, while enhanced TS algorithm and particle mutation strategy are combined to jump out the local minimum [17]. The ABC algorithm is inspired from the behavior of bees, and the main feature of the algorithm is that it doesn’t need to know the special information of the problem while it only needs to compare the merits of the solutions [18]. Since the algorithms aim to predict the structure of proteins, the most important criterion is the lowest energy value of corresponding protein energy function. And this paper focuses on getting lower energy value. We propose a hybrid optimization algorithm based on improved BSA and TS algorithm. It’s not only less sensitivity to initial values of parameters but also more efficient than the algorithms above. Experiments using AB off-lattice model shows that it can get lower energy value than nearly all the algorithms above can get.

2 3D AB Off-Lattice Model

3D AB off-lattice model was proposed by Stillinger et al. [3] on the basis of two-dimensional AB off-lattice model, which considers that the main reason for residues chain to fold into a specific spatial structure is the hydrophobicity of amino acid residues. In this model, we divide the amino acid residues into hydrophobic (A) and hydrophilic (B) two types. So, protein sequences can be simplified as sequences consisted of only A and B [19, 20], and we can predict native structure of the proteins by the relation between bond angles and bond energy of corresponding amino acids.

In the 3D space of this model, the residues of amino acid sequences are sequentially connected into the polymer with no direction keys of unit length. We’d take the bond angles between adjacent keys and the torsional angles of the two planes constituted by the adjacent three key vectors into consideration when the model is used as 3D structure of a protein [2123]. Thus, the 3D structure of n residues is determined by \( n - 2 \) bond angles \( \theta_{1} ,\theta_{2} . \ldots ,\theta_{n - 2} \) and \( n - 3 \) torsional angles \( \beta_{1} ,\beta_{2} , \ldots ,\beta_{n - 3} \). We set the condition \( - \pi \le \theta_{i} ,\beta_{i} \le \pi \). \( \theta_{i} \) and \( \beta_{i} \) are positive when the angles clockwise rotate. Otherwise, the angles are negative. Figure 1 is the configuration diagram of AB off-lattice model.

Fig. 1.
figure 1

Configuration diagram of AB off-lattice model

The energy function of AB off-lattice model consists of bending energy and potential energy of the Lennard-Jones type [2427]. The energy function (\( E \)) of a sequence of n residues is expressed as Eq. (1) [28]:

$$ E = \sum\limits_{i = 1}^{n - 2} {\frac{1}{4}} (1 - \cos \theta_{i} ) + \sum\limits_{i = 1}^{n - 2} {\sum\limits_{j = i + 2}^{n} {4[r_{ij}^{ - 12} - C(\xi_{i} ,\xi_{j} )r_{ij}^{ - 6} ]} } $$
(1)

The first part is bending potential energy which is only associated with bond angles. The second part is gravitational potential energy of any two non-adjacent residues which is not only related to polarity but also the distance of them. Here, \( r_{ij} \) represents spatial distance of non-adjacent residues, \( \xi_{i} \) represents the category of different residues. If the \( i - th \) residue is A, \( \xi_{i} = 1 \), if it is B, \( \xi_{i} = - 1 \).

Where:

$$ C(\xi_{i} ,\xi_{j} ) = \left\{ {\begin{array}{*{20}c} { + 1} & {\xi_{i} = 1,\xi_{j} = 1} \\ { + 0.5} & {\xi_{i} = - 1,\xi_{j} = - 1} \\ { - 0.5} & {\xi_{i} \ne \xi_{j} } \\ \end{array} } \right. $$
(2)

We can see from Eq. (2) [28] that the pairs of residues AA has strong gravity, the pairs of residues BB has less gravity, and AB has weak repulsion. It reflects the true characteristic of real protein to a certain extent that hydrophobic cores form due to larger gravitational between hydrophobic residues when the protein fold into a spatial structure, at the same time, hydrophilic residues are excluded to the outside.

Therefore, the problem of protein structure prediction in the AB off-lattice model becomes the problem to find the \( n - 2 \) bond angles and \( n - 3 \) torsional angles to minimize the protein energy function \( E \), as Eq. (3) [17]:

$$ \mathop {\hbox{min} }\limits_{{\theta_{i} ,\beta_{i} \in \left( { - \pi ,\pi ]} \right.}} E\left( {\theta_{1} ,\theta_{2} , \ldots ,\theta_{n - 2} ,\beta_{1} ,\beta_{2} , \ldots ,\beta_{n - 3} } \right) $$
(3)

3 BSA-TS Algorithm

3.1 BSA

Backtracking Search Optimization Algorithm (BSA) [29] was put forward by Pinar Civicioglu in 2013 for solving real-valued numerical optimization problems. Unlike many evolutionary algorithms, BSA only has one control parameter, and the initial value of the parameter has limited impact on the problem-solving performance. BSA has a simple but effective structure which enables it to easily solve multimodal problems.

3.1.1 BSA’s General Algorithm Framework

The algorithm framework of BSA is analogous to differential evolution algorithms, and it can be explained by dividing its functions into five parts: initialization of population, selection-I, mutation, crossover, and selection-II [29, 30]. Figure 2 is BSA’s general algorithm framework:

Fig. 2.
figure 2

BSA’s general algorithm framework

It has two new crossover and mutation operators to generate a trial population, and the strategy for controlling the search-space boundaries and amplitude of the search-direction matrix enables it to have very strong exploitation capabilities.

3.1.2 Improved Strategies of BSA

(1) At the end of crossover strategy, we use two randomly choices to update the individuals beyond the search-space limits. One is the standard method that we regenerate the \( j - th \) element of individual \( M_{i} \) beyond the search-space limits using \( M_{ij} = rand*(up_{j} - low_{j} ) + low_{j} \), where \( up_{j} \) means the upper bound and \( low_{j} \) means the lower bound of the \( j - th \) component of problem; another boundary control method is that if the value of trial population is greater than the upper bound, \( M_{ij} \) will be set as \( up_{j} \), and if the value of \( M_{ij} \) is less than \( low_{j} \), \( M_{ij} \) will be set as \( low_{j} \).

(2) Since BSA’s convergence speed is slow, this paper will use Penalty strategy to replace Selection-II strategy: First, we sort the parent population \( P \) in ascending order according to the fitness value, then we take its first half as \( R \) and combine \( R \) and the trial population \( T \) obtained after the operation of crossover as a new population \( Mu \). Secondly, the Euclidean distance \( d_{ij} \) between the individuals of \( Mu \) is calculated, and if \( d_{ij} \) is less than D (experience value, related to the length of protein sequence), the greater fitness value between \( Mu_{i} \) and \( Mu_{j} \) will be replaced by \( Penalty \) (experience value, this paper take 1013), and then we sort the individuals of \( Mu \) in ascending order according to the fitness value and take the first \( popsize \) individuals to update population \( P \).

3.2 TS Algorithm

TS algorithm was put forward by Fred Glover in 1986 [3133] based on the local neighborhood search algorithm. Considering that the local neighborhood search is too greedy for a local search which is easy to lead to falling into local minimum, tabu strategy is used in TS to gain more search space to jump out of a local optimal solution (not completely abandoned) and the good individuals can avoid to be missed with the application of aspiration criterion.

3.2.1 Description of TS

The process of TS can be described as follows:

Step 1: Set the algorithm parameters, initialize the current solution randomly and set tabu list blank.

Step 2: Judge whether the termination criterion is satisfied; if it is, jump to step 6.

Step 3: Generate the neighborhood solutions of the current solution, and determine the candidate solutions.

Step 4: Judge whether the aspiration criterion is satisfied; if it is, set the solution as the current solution, put its fitness into tabu list, update the optimal state, and jump to step 2.

Step 5: Judge the taboo attributes of candidate solutions, set the optimal state of candidate solutions which is not in tabu list as current solution, and jump to step 2.

Step 6: Output the optimization results.

3.2.2 Improved Strategy of TS

TS algorithm is used in the second part of BSA-TS hybrid optimization algorithm so that the algorithm can jump out of a local optimal solution. In this paper, we use a mutation operator to generate the neighborhoods of current solution. On the one hand, the mutation operator can guarantee the diversity of particles in the early stage of the algorithm; on the other hand, it produces less disruption to the current solution to ensure the global convergence of the algorithm. The strategy is as follows:

Select the \( k - th \) element of individual \( x \), then perform mutation operator, the \( k - th \) element of new individual \( x_{new} \) after mutation is as follows [31]:

$$ x_{new}^{k} = x^{k} + 2*\pi *f(r)*c*rate^{i} $$
(4)

Where \( c \) is a random number between 0 and 1, \( rate \) is scale factor, \( i \) is the current iteration the range of which is from 0 to K-1 (K is the size of neighbors). In this paper, we set \( rate \) as 0.95 which is the same as that in reference [31]. \( f(r) \) is the correlation coefficient which is defined with Eq. (5) [17]:

$$ f(r) = \left\{ {\begin{array}{*{20}c} {1,r \ge 0.5} \\ { - 1,r < 0.5} \\ \end{array} } \right. $$
(5)

Where \( r \) is a random number between 0 and 1.

3.3 BSA-TS Algorithm

3.3.1 Parameters

In this paper, the BSA-TS algorithm is realized through MATLAB R2012a in Windows 7 system. According to the references and the experience of experiment, the parameters are set as follows: the population size \( popsize \) is 200; the maximum number of iterations of the part of BSA \( maxcycle \) is 3000; crossover rate \( mixrate \) is 0.88; \( Penalty \) is 1013 (appropriate adjustments are needed according to different sequence lengths). The size of neighborhood \( K \) is 800 during the part of TS algorithm; the maximum iteration of the part of TS \( W \) is 2000; tabu length \( tabulength \) is 10 + N, and \( rate \) is 0.95.

3.3.2 Initialization

It’s need to initialize the coordinates of each individual when we calculate the corresponding fitness value. The coordinates are all calculated with Eq. (6).

$$ (x_{i} ,y_{i} ,z_{i} ) = \left\{ {\begin{array}{*{20}l} {(0,0,0)} \hfill & {i = 1} \hfill \\ {(0,1,0)} \hfill & {i = 2} \hfill \\ {(\cos \theta_{1} ,\sin \theta_{1} ,0)} \hfill & {i = 3} \hfill \\ {(x_{i - 1} + \cos (\theta_{i - 2} )\cos (\beta_{i - 3} ),y_{i - 1} + \sin (\theta_{i - 2} )\cos (\beta_{i - 3} ),z_{i - 1} + \sin (\beta_{i - 3} ))} \hfill & {4 \le i \le n} \hfill \\ \end{array} } \right. $$
(6)

3.3.3 Description of BSA-TS Algorithm

During the BSA-TS algorithm based on AB off-lattice model, the first step is to initialize each parameter, and generate population \( P \) and population \( P^{old} \) randomly which both contain N individuals whose range is \( \left[ { - \pi ,\pi } \right] \). Then the improved BSA algorithm is used to obtain the local optimal solution. At last, the result gained by improved BSA is used as the initial solution of TS algorithm in order to avoid the low efficiency of search caused by casual initialization.

The algorithm framework of BSA-TS is as Fig. 3:

Fig. 3.
figure 3

Algorithm framework of BSA-TS

As we can see from Fig. 3, the process of BSA-TS can be briefly described as follows:

Step 1: Set algorithm parameters and initialize the population \( P \) and \( P^{old} \) randomly.

Step 2: Judge the termination criterion of the part of BSA that whether \( epk < \hbox{max} cycle \); if it does, continue execution, else jump to step 4.

Step 3: Execute the Selection-I, mutation, crossover, boundary control, and penalty sequentially.

Step 4: Set global optimal solution obtained by BSA as the initial current solution and set the tabu list blank.

Step 5: Execute the part of improved TS, and output the optimization result when the termination criterion of TS is met.

4 Experimental Results and Discussion

4.1 Prediction with Fibonacci Sequences

Fibonacci Sequences [34] usually used in the protein structure prediction are defined as follows:

$$ S_{0} = A,S_{1} = B, \ldots ,S_{i + 1} = S_{i - 1} *S_{i} $$
(7)

where ‘*’ is a connection symbol, A denotes a hydrophilic amino acid and B denotes a hydrophobic amino acid.

Table 1 shows the trial Fibonacci Sequences. Table 2 shows the comparison of lowest energy values of the 3D structures of corresponding Fibonacci Sequences gained by PSO, TPSO, LPSO, PGATS, ABC and BSA-TS.

Table 1. Fibonacci Sequences
Table 2. Lowest energy values of Fibonacci Sequences

We can see from the Table 2 that the lowest energy values of the Fibonacci Sequences gained by BSA-TS are lower than those obtained by PS0, TPSO, LPSO, PGATS and ABC except the sequence length of 13. The detailed description is as follows. When the sequence length is 13, the lowest energy value BSA-TS get is −1.3852 while the lowest energy value TPSO gets is −4.7284 and the lowest energy value LPSO gets is −4.6159 which means the 3D structure we get is less stable than TPSO and LPSO gets but more stable than the ones other algorithms can get. Additionally, when the sequence length is 21, the lowest energy value BSA-TS gets is −9.9878 while the lowest energy value other algorithms can get is −8.7379, and when the sequence length is 34 or 55, the lowest energy values BSA-TS get are −11.3741 and −23.1124 correspondingly while the lowest energy values others get are only −10.3953 and −22.4172 correspondingly. Therefore, we can conclude that the BSA-TS algorithm can effectively predict the 3D structure of protein using Fibonacci Sequences. In addition, Table 3 lists the bond angles and torsional angles of the corresponding lowest energy configurations predicted by BSA-TS algorithm.

Table 3. The bond angles and torsional angles of the Fibonacci Sequences

4.2 Prediction with Real Protein Sequences

In this section, we use the real protein sequences which are the same as those in reference [16]. These protein sequences can be downloaded from PDB database. The real protein sequences used in this paper are as follows:

In Table 4 above, A, C, G, I, L, P, M and V are hydrophobic amino acids, and D, E, F, H, K, N, Q, R, S, T, W and Y are hydrophilic amino acids.

Table 4. Fibonacci Sequences

Table 5 shows the lowest energy values of real protein sequences.

Table 5. Lowest energy values of real protein sequences

We can see from Table 5 that the lowest energy values of the real protein sequences obtained by BSA-TS are lower than all those gained by PSO, TPSO, LPSO, PGATS and ABC. When the real protein sequence is 2KGU, 1CRN, 2KAP or 1PCH, the lowest energy values BSA-TS get are −34.2979, −57.6271, −36.1682 and −64.2034 correspondingly while the lowest energy values others get are −32.2599, −52.3249, −30.3643 and −63.4272 correspondingly. So we can conclude that our algorithm is feasible and effective to predict the 3D structure of real proteins.

Tables 2 and 5 show us that BSA-TS algorithm is efficient especially in real protein 3D structure prediction. When we predict protein 3D structure using Fibonacci Sequences, our algorithm is better-behaved as the sequence length is longer.

Table 6 lists the bond angles and torsional angles of the real protein sequences.

Table 6. The bond angles and torsional angles of the real protein sequences

The following Figs. 45, 6 and 7 show the 3D structures of real protein sequences predicted by BSA-TS algorithm and the real 3D protein structures downloaded from PDB database. We can conclude that the 3D structures predicted by BSA-TS algorithm are similar to the native structures of real protein sequences.

Fig. 4.
figure 4

Comparison of 2KGU

Fig. 5.
figure 5

Comparison of 1CRN

Fig. 6.
figure 6

Comparison of 2KAP

Fig. 7.
figure 7

Comparison of 1PCH

5 Conclusion

Due to the importance of 3D protein structure prediction, many theoretical prediction algorithms are proposed. But on account of the complexity of prediction, the present algorithms’ accuracy is usually unsatisfactory. Since the most important criterion is the accuracy of algorithm, we focus on getting more stable structure with some widely used Fibonacci Sequences and real protein sequences.

This paper proposes a BSA-TS hybrid optimization algorithm which integrates the advantages of BSA that it has an efficient, fast structure, less control parameters and powerful search capability and the advantage of TS that it has strong local search capability and high precision of search. Additionally, the addition of penalty method can effectively reduce the probability of falling into local optimum caused by general BSA’s Selection-II strategy. Then we use BSA-TS to predict the structure of protein based on the 3D AB off-lattice model. The experimental result shows that BSA-TS can get more stable structures which have lower energy compared with the structures gained by other algorithms. That indicates the feasibility of BSA-TS on protein structure prediction problem.