Keywords

1 Introduction

Sequence alignment is paramount to molecular sequence analysis. It can help to build a phylogenetic tree of related DNA sequences or to predict the function/structure of unfamiliar protein sequences by aligning them with others whose function/structure is already acknowledged. Sequence alignment establishes an alignment of two or more sequences to maximize the similarities amid them [1]. This NP-hard problem [2, 3] can benefit from computational intelligence (CI) algorithms in reasonable processing time.

In recent years, metaheuristic procedures aided in producing approximate solutions for the MSA dilemma. Their primary rationale is to commence by an initial solution and ameliorate the MSA through a series of iterations until the solution does not become better any longer. This category of methods embraces tabu search (TS) [4], genetic algorithm (GA) [5], simulated annealing (SA) algorithm [6], ant colony (AC) algorithm [7], particle swarm optimization (PSO) [8], artificial bee colony (ABC) algorithm [9], and so on.

This research study brings in a hybrid line of attack named PSO-TS algorithm to devise an approximate solution to the MSA impasse combining the best characteristics of the PSO and the TS schemes.

The remainder of this document is organized as follows: Section 8.2 presents a brief literature review of the previous works pertinent to the PSO-TS framework. Section 8.3 depicts the basic concepts of the PSO and the TS metaheuristics. The details about the components of the PSO-TS algorithm are in Sect. 8.4. Simulation outcomes appear in Sect. 8.5, together with some discussions. Section 8.6 suggest improvements to the PSO-TS algorithm. Finally, this study ends with the comments from Sect. 8.7.

2 Literature Review

Among the significant multiple sequence approaches, some of them employ iterative alignment aligners. Rias et al. [4] applied a tabu search variant relying on the COFFEE objective function to find better results for the multiple sequences. In [5], the authors developed an amended GA with an intelligent selection strategy to resolve the multiple biological sequence dilemma. Proper alignments are obtained compared with those done by classical GA.

In Ref. [8], the authors applied PSO and Clustal X aligner to produce better results than those from Clustal X alone. An ABC algorithm for solving the MSA issue appears in [9]. In Ref. [10], V. Cutello et al. presented an immune-inspired algorithm (IMSA) to tackle the MSA dilemma utilizing ad hoc mutation operators. Simulation results on some BaliBASE v.1.0 instances show that IMSA is superior to other aligners such as PRRP, CLUSTAL X, SAGA, DIALIGN, PIMA, MULTIALIGN, and PILEUP8. An efficient method by using a multi-objective GA (MSAGMOGA) to discover optimal alignments is proposed in [11]. Experiments on the BaliBASE 2.0 database confirmed that MSAGMOGA obtained better results than MUSCLE, SAGA, and MSA-GA methods.

Recently, in [12], the authors announced a novel multi-objective evolutionary scheme called hybrid multi-objective ABC (HMOABC), which hinges on swarm intelligence for the MSA dilemma. Experimental results on some instances coming from the BaliBASE 3.0 database showed that the HMOABC is an auspicious approach for solving the MSA impasse. Reference [13] suggests a better-quality Chemical Reaction Optimization (CRO) algorithm for the MSA problem. In their work, the authors inserted an intelligent diversification mechanism in the initialization process to help the CRO algorithm to travel around the search space globally. Simulation results confirmed the superiority of this investigative effort over others.

3 Preliminaries

3.1 Particle Swarm Optimization (PSO)

Particle swarm optimization (PSO) is a metaheuristic (aka nature-inspired) optimization algorithm conceived by Kennedy and Eberhart [14] that had as its inspiration the bird flocking behavior. Rather than the hard optimization formulation, there is a new recast problem. This different formulation consists of a population of answers entitled particles, where the pair of features position and velocity characterize each particle. At each iteration, the particle’s locus is updated in accordance with its personal best position and the best solution of the swarm. The subsequent equations govern the evolution of the population:

$$ {\displaystyle \begin{array}{l}{V}^{\left(k+1\right)}=w.{V}^{(k)}+{c}_1.\mathit{\operatorname{ran}}{d}_1.\left( pbes{t}^{(k)}-{X}^{(k)}\right)\\ {}\kern3em +{c}_2.\mathit{\operatorname{ran}}{d}_2.\left( gbest\left({}^k\right)-{X}^{(k)}\right),\mathrm{and}\end{array}} $$
(8.1)
$$ {X}^{\left(k+1\right)}={X}^{(k)}+{V}^{\left(k+1\right)}, $$
(8.2)

where for each particle:

  • X is its position

  • V is its velocity

  • w represents the inertia weight

  • pbest is its best place

  • gbest is the global best position of the swarm

  • rand1, rand2 are random values within [0, 1]

  • c1, c2 are positive constants regulating the impact of the personal optimum as well as the global best solutions on the search course, respectively

  • k is the iteration number

The process stops after reaching a predefined number of iterations Itmax.

3.2 Tabu Search Optimization (TS)

TS appeared the first time in Glover’s manuscript (1986) [15] to unravel a wide range of hard optimization problems. TS starts with a random tentative answer and appraises the fitness function for the given solution. Then all possible neighbors of the given solution are generated and evaluated. To overcome the local optimum limit, TS saves the better local neighbors (after searching, examining, and, if necessary, discarding them) in a Tabu List (TL). Moreover, TS can exploit an aspiration rule to make the whole search processing continue or to devise a diversified guided search tactic to other promising regions.

4 Proposed Method

PSO’s main drawback is its premature convergence and local optimum problem. To evade this limit, TS is incorporated as a local search procedure to enrich the global best solution and to apply the diversification mechanism to guide the search to other promising regions of the search space. Then, the novel hybrid technique, called PSO-TS, consists of strong cooperation among PSO and TS. It makes full use of the exploration ability of PSO and the exploitation ability of TS.

In our MSA problem, PSO-TS works as follows:

  1. 1.

    PSO modifies both particle positions and velocities after establishing the initial population with the recommended swarm initialization strategy.

  2. 2.

    After that, a new global best solution (gbest*) emerges from the improvement of the current global best solution (gbest) via the TS procedure.

  3. 3.

    This iterative process ends when a stopping condition is reached.

The PSO-TS hybridization strategy has a flowchart of is along these lines (Fig. 8.1):

Fig. 8.1
figure 1

PSO-TS algorithm flowchart

4.1 Components

Solution Encoding

Each solution (particle) entails a set of vectors, whose elements indicate the position of the randomly interleaved gaps in the different sequences waiting for alignment. Here, the sequences should have the same length L (the typical L value is 1.2 times of the most prolonged sequences) [16]. Figure 8.2 brings in an example of this encoding scheme.

Fig. 8.2
figure 2

Sequence alignment: (a) Set of sequences to align, (b) insertion of gaps, and (c) solution representation

Swarm Initialization

The multiple small-popsize initialization strategy (MSPIS) [17] renders the initial population and ensures its diversity. Every time, there is a small number of chromosomes by a small-popsize tactic. The best two individuals are selected into the initial population until the number of chromosomes equal to the present population size. A summary of the detailed MSPIS steps becomes.

  • Step1. Generate a small number of chromosomes through the small-popsize initial method.

  • Step2. Calculate the mean value of the small number population fitness value, Meanpopi.

  • Step3. All the chromosomes whose fitness values exceed Meanpopi go into the initial population.

  • Step4. Repeat steps 1–3 until the number of chromosomes equal to the present population size.

This initialization method delivers an approximate optimal solution as near as possible.

Objective Function

Habitually, the objective function appraises the alignment quality in mathematical terms. For that, the score apportioned to each particle (alignment) consists of the summing of the scores (SP) of the alignment of each pair of sequences [18].

The score of each pair of sequences comprises the sum of the score assigned to the match of each pair of symbols. A substitution matrix, such as the PAM205 matrix, can achieve this effect. The objective function for an alignment A with k sequences is

$$ \mathrm{Score}(A)=\sum \limits_{i=1}^{k-1}\sum \limits_{j=i+1}^kS\left({A}_i,{A}_j\right), $$

where S(Ai, Aj) stands for the alignment score among two aligned sequences, Ai and Aj.

Particle Move Operator

This operator is necessary to update the position of the particle in the search space. Each swarm element movement is contingent on its current location and the same as the leader. In this work, the distance between the particle and the leader amounts to the proportion of matching gaps in the sequences as

$$ \mathrm{Distance}=\frac{\mathrm{matching}\kern0.5em \mathrm{gaps}}{\mathrm{total}\kern0.5em \mathrm{gaps}}. $$

This research introduced the crossover operator from [16] to transport each particle toward the leader. It entails dividing the current alignment into two segments according to a randomly picked crossover point from the range [1, Distance* L]. After that, the best-produced child replaces the current particle.

4.2 TS Components of the MSA Problem

Cost Function

It evaluates the quality of each candidate solution. In this research study, the authors retain the same one from the PSO algorithm.

Initial Solution

In the suggested PSO-TS methodology, the TS works as a local search process to ameliorate the global best solution (gbest) yielded by the PSO algorithm. Then, the PSO output becomes an initial solution for the TS procedure.

Generation of Neighbors

In order to generate a set of neighbors of the current solution, the authors apply a simple but effective strategy, which works as follows:

  1. (i)

    Firstly, it picks a random amino acid from a randomly chosen sequence in the alignment and checks whether one of its neighbors is a gap.

  2. (ii)

    If this is the circumstance, the algorithm swaps the elected amino acid with a gap neighbor.

  3. (iii)

    If both neighbors are gaps, then one of them is picked randomly [18].

Figure 8.3 portrays this mechanism.

Fig. 8.3
figure 3

Neighbors generation strategy

Tabu List Update

This list stores previously visited solutions to help the algorithm evade being trapped in local optima. Here, the user, depending on the size of the problem instances, predefines the tabu list length (TLL). Besides, the authors opted to modify the TLL value dynamically by increasing or diminishing it with a prefixed constant X. Such a mechanism helps to attain some other solution and to overcome the cycling problem.

Stopping Condition

The authors chose the most straightforward and most commonly used stopping criterion, i.e., to terminate upon reaching a predefined number of iterations.

The pseudo-code of our TS is given below:

Pseudo-code of TS algorithm

Step 0: Generate an initial alignment Ainit.

Initialize ITMAX, Ns, TLL

Set TL ← ∅, Acurr = Ainit , Abest = Ainit

Step 2: Iterate the following steps for ITMAX iterations

Step 2.1: Generate NS neighboring alignments

Ai (i=1,2 …, Ns) of ACURR

Step 2.2: A_bestBest_ neighbor(A1, A2, …ANs)

If (A_best ∉ TL) and S(A_best) > S Acurr) then

AcurrA_best

Else try next test alignment and go to step 2.1

Tl ← TL ∪ Acurr

If TL is full, remove the oldest solution from TL

Step 2.3: If S(Acurr) > S(Abest) then

AbestrAcurr

Step 3: Return ABEST and S(Abest)

After the description of both the PSO and TS algorithms, the global framework of the PSO-TS procedure is as follows:

PSO-TS Pseudo-code

1. Apply MSPIS strategy to generate an initial swarm

2. Fix NG value // Number of generations

3. Set j ← 1

4. While (j ≤ NG) do

4.1. Determine the leader particle value gbest

4.2. For each particle in the population do

a) Evaluate the distance between gbest and the particle

b) Select a crossover point randomly

c) Apply particle move mechanism

d) Assign to the current particle the best-produced child

End for

4.3 Update gbest // Calculate the new leader particle gbest

4.4 New_gbest ← TS(gbest) // Apply TS to improve gbest

4.5 Insert New_gbest into the current population

4.6. j ← j + 1

5. End while

6. Return gbset value and its correspondent particle. // Output the result.

5 Simulation and Results

The PSO-TS algorithm is implemented using JCreator version 3.5 and a personal computer with a 2.66 GHz Intel Pentium IV processor. A set of tests verified and validated the PSO-TS scheme, where the objective is to compare the PSO-TS method with other published works, including the IMSA approach [10] and the BPSO algorithm [19]. The BaliBASE SP score (SPS) results for both small and large datasets taken from the BaliBASE database [20] are portrayed in Tables 8.1 and 8.2.

Table 8.1 PSO-TS vs. IMSA: SPS comparative results for the BaliBASE test sets
Table 8.2 PSO-TS algorithm versus BPSO approach: SPS comparative results for the BaliBASE test sets

The subsequent procedure provides the SP score [21]: for a candidate alignment (individual) of N sequences encompassing M columns, then the i-th column within the alignment becomes Ai1, Ai2, …., AiN. A value of pijk characterizes each pair of residues Aij and Aik. If pijk = 1, it means that the candidate alignment residues Aij and Aik have been aligned with each other as far as the reference alignment is concerned. Otherwise, pijk = 0. Consequently, the score for the i-th column turns out to be

$$ {S}_i=\sum \limits_{j=1,j\ne k}^N\sum \limits_{k=1}^N{p}_{ijk}. $$

For the candidate alignment, the overall SPS becomes

$$ \mathrm{SPS}=\sum \limits_{i=1}^M\frac{S_i}{\sum \limits_{i=1}^{Mr}{S}_{ri}}, $$

with the number of columns in the reference alignment given by Mr while Sri is the score Si for the i-th column for the reference alignment. The values of SPS lie in the interval [0.0, 1.0]. High SPS values point toward closer likeness with the BaliBASE reference alignment [21].

From Tables 8.1 and 8.2, it is clear that comparing with IMSA and BPSO, the average SPS values of PSO-TS algorithm are more significant than or comparable with those obtained by the techniques stated above for the short, medium, and long sequences. In addition, significant improvements are observed in test cases with a large number of sequences, which shows that our approach can work well with large problem sizes.

6 Future Trends

Other metaheuristics and multi-objective optimization schemes can aid in unraveling this bioinformatics impasse [22,23,24].

As perspective innovations of this work, the authors contemplate (i) integration of other mechanisms within the neighborhood generation step or (ii) intensification/diversification strategies in TS core to improve the quality of multiple alignments. In addition, the authors will use other powerful PSO variants as a part of this hybridization tactic to find a better solution for the MSA problem. A comparison of the proposed method with some other state-of-the-art techniques such as Clustal W, SAGA, or MULTALIGN, with other benchmarks, is possible to verify its enormous potential [25,26,27].

Deep learning (DL) has been in the spotlight as a potent tactic that makes noteworthy signs of progress when it comes to solving the issues plaguing the artificial intelligence. Nevertheless, numerous caveats, e.g., local minima trapping, inferior performance, and elevated computational time, still arise while applying DL to MSA. So, global optimization procedures like differential search algorithms can aid DL deployments to reach the best outcome and data [28,29,30,31].

7 Conclusion

The MSA problem is one of the primary techniques utilized in computational biology, e.g., genomic annotation, homology searches, gene regulation networks, protein structure prediction, and functional genomics, to name a few. Among MSA bottlenecks, there is the alignment of more than a pair of biological sequences, which is an NP-hard optimization problem.

This research showcases the PSO-TS approach, which is a novel hybrid algorithm constructed on metaheuristics developed to tackle the multiple sequence alignment (MSA) problem. The PSO-TS framework combines the PSO benefits, particularly its simplicity in implementation and its inexpensive computational processing time, with the best tabu search algorithm features, i.e., its local upgrading practice. These improvements enrich the global best solution by augmenting the diversification process. The results summarized in this paper show the superior capability of our hybrid approach compared to some other literature works.