1 Introduction

Various kinds of optimization algorithms have been implemented to tackle folding in homology modelling, threading and ab initio based on artificial intelligence and hybrid approaches. Protein folding is non-deterministic polynomial-time hard problem for identifying protein conformation and folding process. Ant colony optimization (ACO) is used in various problems like routing vehicles, dynamic problems, stochastic problems, multi-target implementation, multi-target parallel implementations, software testing, travelling salesman problem and protein folding. Ant colony meta-heuristic optimization is one of the few algorithms that can help to gain an atomic level insight into the conformation of protein folding states and its intermediate weights and pheromone present along the protein folding. This ant colony optimization algorithm is inspired by research on the behaviour of real ant colonies. Ant colonies are distributed systems which perform complex tasks for finding food in an optimized way. Ant algorithms are derived from the observation of real ant’s behaviour and optimized for distributed control problems [1]. The different aspects of the behaviour of ant colonies are useful in solving computational protein folding problems. The main logic behind ant colony algorithms is to solve the NP hard problem of protein folding mechanism [2]. The ants exhibit a complex behavioural pattern which helps them to find optimized shorter path between the nest and the food. Ants are visually impaired and an important aspect of their communication is based on chemicals released and deposited (pheromones) by them [3]. They tend to choose paths marked by strong pheromone concentration (pheromone trail) for food. Proteins may also be considered as a colony where amino acids are nodes attached to each other with the help of peptide bonds. Protein structure folds in a compact form. Using ACO behaviour, researchers can identify the folding pattern of proteins [4]. Similar to the behavioural pattern exhibited by ants, proteins can also be optimized to have a shorter connecting path between one amino acid to another amino acid [5]. An interesting experiment was conducted on ants involving a double bridge connecting the ant nest to food source to study the pheromone trail laying and following behaviour. Initially, only the long branch was opened to ants and later the short branch was also offered. Since the starting pheromone concentration was high on the long branch with slow evaporation of pheromone, so majority of ants always chose the long branch even after the appearance of shorter branch [6, 7]. Similarly, in protein structure starting amino acid bonding is very strong on the long branch with standard distance factors using general AMBER force field (GAFF), and other factors like temperature, pressure, volume, degree of freedom and total time of protein folding. With these ensembles, proteins are also dependent on standard distance factors; when distance factors are changed in proteins, it causes random or specific mutation in proteins resulting in diseases.

1.1 Monte Carlo simulation

The Monte Carlo simulation (MC) algorithms is based on the energy distribution for a given protein temperature and executes temperature simulation for protein folding pattern [810]. MC algorithms are based on conformational states and their searching efficiency has been enhanced in the Basin Hopping approach, which couples large step Monte Carlo jumps with gradient-driven local minimization [11, 12]. After searching the final energy distribution, it modifies the transition probability to accelerate the transition between different states [13]. Monte Carlo minimization has been successfully applied to the conformational searching in protein folding pattern [2] by executing the local energy minimization of each trajectory of protein folding pattern [14]. Monte Carlo approach allows efficient exploration of protein conformation and is comparable with genetic algorithm and other heuristic approaches.

1.2 Molecular dynamics

Molecular dynamics (MD) simulation is based on Newton’s equations of motion. It monitors atom movements during protein folding pattern [15] and is one of the most useful methods for known biological complexity of protein folding problem [16, 17]. The long MD simulation is a major limiting factor as the incremental timescale is in the order of femtoseconds, while the fastest protein folding pattern timing of a small protein less than 100 residues is in the millisecond range [1822]. There are many softwares for MD simulation which are also used for the structure refinements of low-resolution model [23, 24]. Number of parameters like implemented torsions, ensembles and coarse-grained energy functions are used for refinement [25, 26]. Molecular dynamics simulation is based on Newton’s equations of motion to all atoms concurrently over a small time step to conclude new atomic positions and velocities. In cases of Monte Carlo (MC) and molecular dynamics (MD), the force field controls the total energy, which concludes the evolution of the systems. Molecular dynamics simulation is proven to be a powerful approach for studying protein dynamics.

1.3 Genetic algorithm

Conformational space annealing is based on one of the genetic algorithms (GAs) for protein folding pattern. Using GA folding pattern of the proteins can be identified [27, 28]. The MC algorithms are based on local minima, searches whole conformation of the protein and generate low-energy conformation, while conformational space annealing applies various global optimizations, searches whole conformation of the protein and generates low energy for folding pattern of the protein [29, 30]. Conformational space annealing has been successfully applied to ab initio modelling of the protein.

2 Materials and methods

The flow chart (Fig. 1) represents the movement of ant (protein) based on two parameters A (heuristics as a force fields of amino acid to other amino acids) and B (pheromones as a distance of amino acid to other amino acids).

Fig. 1
figure 1

Flow chart representation of ACO algorithm, movement of ant (protein) based on two parameters

2.1 Ant colony algorithms

  1. 1.

    Implementation The methodology for ant colony optimization algorithm implementation for the protein folding pattern, which is NP hard problem [31], and generation of various conformational states using ACO [3237] is shown in Fig. 1. A and B are two parameters where A (heuristics as a force fields of amino acid to other amino acids) and B (pheromones as a distance of amino acid to other amino acids) determine the relative influence of the force fields (taken from the GAFF) or distance factors (taken from the GAFF) [38]. These factors are responsible for the trail of amino acids and the heuristic information on the path traversed by amino acids. Initialization (I) is initial distance factor and heuristic value deposited on each amino acid in proteins, respectively. Initially, (I) is set equal to the depth of proteins and I is set equal to the number of decision node (amino acid). This variable specifies the count status of node (amino acid) by the amino acids k, initially set to 0. N k i is the feasible neighbourhood of amino acids k when being at node i, initially set to zero. Key = End node in proteins, Pframe = [Total no of node (amino acids in the protein)]. Pframe = [1, 2, 3, 4, 5…EndAA], NC = total node sequence covered up to now, calculation of depth of proteins using algorithm ACO_ DEPTH. Initialization Init accordingly and determination of number of decision node in proteins and setting up the initial value. When expanded a fractional conformation I k I i to I i+1 during the edifice phase of ant colony optimization algorithms, next to kin direction d of I i+1 to I i−1 is resolved based on heuristic (force fields K ij ) and pheromone (distance of amino acids r ij ) values according to following probabilities \(P_{i} d = \left( {T^{\alpha }_{xy} } \right)\left( {\eta^{\beta }_{xy} } \right)/\sum_{y} \in {\text{allowed}}_{y} \left( {T^{\alpha }_{xy} } \right)\left( {\eta^{\beta }_{xy} } \right).\)

  2. 2.

    Application development We developed Perl application, for validation of proposed algorithm and also to find out protein folding pattern of the protein using ACO algorithm. In this application by selecting PFP button of the application, the desired protein in PDB format can be selected. This would generate coordinates of given PDB file and save it into analysis file folder. This output can be used to investigate the optimized folding pattern of the given PDB file.

3 Results and discussion

We report ACO algorithm for protein (including membrane protein) folding prediction. We have designed new algorithm for protein conformation and protein folding. The algorithm traverses each node and prioritizes the path according to the path strength. Paths having the standard distance factors strength are given the highest priority for testing followed by next lower standard distance factors strength. This algorithm finds optimized path for protein folding (native conformations) within nanoseconds of CPU. The protein with the PDB code: 4BEY (night blindness causing G90D rhodopsin) was used as an example (Fig. 1) to prioritize the various conformation states using ant colony optimization algorithms. The algorithm was applied as follows.

GAFF (general AMBER force field) is well matched to the AMBER force fields, GAFF is appropriate to study range of molecules, and generally, all the organic molecules are made of C, H, S, O, N, P, F, Br, Cl and I. The interaction force fields (K ij ) parameter between one molecule to another molecule and interaction distance (r ij ) parameter between one molecule to another molecule are described in supplementary Table 1. AMBER and GAFF force fields have been reported to work well in case of drug designing, biological molecules and organic molecules. GAFF is more compatible for rational drug designing and applies harmonic function form as following

$$\begin{aligned} E_{\text{Pair}} & = \sum\limits_{\text{Bonds}} {K_{r} \left( {r - r_{\text{eq}} } \right)^{2} + \sum\limits_{\text{Angles}} {K_{\theta } \left( {\theta - \theta_{\text{eq}} } \right)^{2} } } + \sum\limits_{\text{Dihedrals}} {V_{n} /2\left[ {1 + {\text{COS}}\left( {n\varphi - \varUpsilon } \right)} \right]} \\ & \quad + \sum\limits_{i < j} {\left[ {A_{ij} /R_{ij}^{12} - B_{ij} /R_{ij}^{6} + q_{i} q_{j} /R_{ij} } \right]} . \\ \end{aligned}$$

3.1 ACO algorithms implementation for identification of protein folding pattern using standard distance and generalized AMBER force fields (GAFF) parameters

  1. 1.

    Firstly, the amino acids starting node A and its neighbourhood N k s  = 1 was defined. Heuristic was known from the generalized AMBER force fields. One per cent distance factors is evaporated from the node covered up to now, i.e. node A − 1 initially has approximately 1 distance factor value which after evaporation is left to 0.082. Optimized path covered after finding each next node to be visited upon is calculated and thus its length, i.e. A − 1 the optimized path is A − 1 where length is equal to 1. The next node to be moved upon is node 1.

  2. 2.

    As the next node G to be moved upon is not equal to key node (i.e. end node), the amino acids starting node are now node 1 and the above steps are again followed.

  3. 3.

    Applied random proportional rule to decide the next node of ant, probabilities of visiting the ant from node to node, the process is repeated till the ant reaches the destination node END_AA.

  4. 4.

    As node covered up to now = i which is not equal to Pframe = {M, C, G, T, E), the amino acid again starts from the start node i.

  5. 5.

    As node covered up to now = (M, C, G, T, E). The ant had now covered one full path starting from node 1 to node END_AA. The amount of distance factors deposited on each edge in the traversed path up to now is calculated, and the net pheromone is updated on each edge in the path equal to amount left after evaporation of distance factors + amount deposited after path traversed.

  6. 6.

    The change in heuristic (GAFF) is calculated, i.e. Δ n j i  = n ij /C k i .

  7. 7.

    As node covered until now = (M, C, G, T, E) which is not equal to Pframe = (M, C, G, T, E), the ant again starts from the start node i.

  8. 8.

    The process is continued till all nodes are covered.

  9. 9.

    When node covered = Pframe, the strength of each path was calculated, for example, for the path (M, C, G, T, E), and the strength is calculated as I − 1 = (final pheromone value) × (final heuristic value). Similarly, for the others edges in the path we calculate edge strengths. Finally, adding all edge’s strength in the path gives the final value for the path strength (Table 1)

    Table 1 Probability-based result summary of different moves of an ant path, here we use PDB:4BEY [39] PDB file (night blindness causing G90D rhodopsin) for calculation of protein folding pattern using ant colony optimization algorithms

Further, the algorithm using was compared and validated using PDB:4BEY protein. There was a good correlation in the results of observed and the experimental results. Figure 2a represents the folding pattern priority on the basis of nearest distance and favourable force fields using ant colony optimization (ACO) where different colours represent amino acids of small peptide (PDB:4BEY) and their M to E movement. The Fig. 2b represents the folding pattern of PDB:4BEY on the basis of nearest distance and favourable force fields using ant colony optimization (ACO) as in 2a in graphical form, where blue colour represents MET (node 1), cyan colour represents CYS (node 2), green colour represents GLY (node 3), orange colour represents THR (node 4), and red colour represents GLU (node 5) amino acids. The best optimized and prioritized folding paths are given in Table 2 covering all the residues of PDB:4BEY.

Fig. 2
figure 2

Panel a represents the PDB:4BEY protein folding pattern using ant colony optimization. Panel b represents the small peptide folding pattern of PDB:4BEY

Table 2 Independent path’s strength versus priority

The final path according to the total path strength and priority is shown in Table 2. The path having the maximum combined strength of pheromone and heuristic has been given the highest priority.

A Perl-based application protein folding energy-based recognition tool (PFEBRT) was developed as shown in Fig. 3a, b to determine the protein folding pattern of the protein using ant colony optimization algorithms. With this, users can easily select desired protein in PDB format and generate coordinates of given PDB file and save it into analysed file folder. In this work, a crucial role is played by CPU time. The maximum number of local minima search of sequences using ant colony algorithm is improved with respect to time and minimum number of CPU usage.

Fig. 3
figure 3

a Represents the front panel view of protein folding energy-based recognition tool, and b represents the performance of protein folding energy-based recognition tool, where x axis represents different GPCRs at different length and y axis represents the CPU timing in femtoseconds

4 Conclusion

We have developed an ant colony-based algorithm and implemented in Perl language for protein folding. Using this application, the conformation of protein folding states, intermediates weights and pheromone present along the protein folding can be determined using amino acids interaction, ensembles and force fields. Protein folding is non-deterministic polynomial-time hard (NP hard) problem, using PDB files identification of protein conformation and folding process can be done. The ant colony meta-heuristic optimization is one of the few algorithms that can help to gain an atomic level insight into the conformation of protein folding states and intermediates weights and pheromone present along the protein folding. It finds a more optimized path of folding states. Development of ACO algorithms is more realistic and CPU-based models for protein structure folding pattern, i.e. NP hard problem.