1 Introduction

Unmanned aerial vehicles (UAVs) are attracting the interest of many researchers all over the world [1]. This popularity may be attributed to deep studies in theoretical analysis [2, 3] and potential use in many applications such as search and rescue missions, surveillance, law enforcement, inspection, mapping, and aerial cinematography.

Compared with a single UAV, formation of the UAVs can leverage the capabilities of the team to have more effective performance in missions such as cooperative simultaneous localization and mapping (SLAM), coverage and recognizance, and security patrol [4]. Hence, recent years have seen an increasing interest in the study of UAV formation control from both theoretical and experimental points of view [57].

In the literature, many methods have been applied to the formation keeping control. Zhang and Liu [8] combined Kalman filter with PID controller to design UAV formation flight control system. Xie et al. [9] proposed two nonlinear robust formation control algorithms to solve the problem of designing nonlinear robust formation controllers on a team of UAV using off-the-shelf autopilots. Paul et al. [10] presented a solution for formation flight and formation reconfiguration of UAVs. It is based on a virtual leader approach and combined with an extended local potential field. However, these approaches all have their own strength and weakness, and are more suitable for some applications and less for others [11].

In this paper, we adopt the RHC approach to solve the formation control problem for fixed-wing UAVs. RHC is an optimization-based control method originating in process industry in the early 1970s [12]. Recently, RHC is utilized to achieve formation fight and other cooperative tasks. The main thought of RHC is online receding/moving optimization, which is based on the simple idea of repetitive solution of an optimal control problem and state updating after the first input of the optimal command sequence. It breaks the global control problem into several local optimization problems of smaller sizes, and thus can decrease the computing complexity and computational expense significantly. In the formation control problem, RHC is an effective method of solving constrained optimization with the following advantages [13]:

  1. (1)

    Different control objectives can be achieved by changing appropriate terms in the cost function (e.g., formation keeping, formation joining, and flying) [14];

  2. (2)

    The RHC strategy can adapt to the change of the conditions (e.g., topography, threat source, and internal instructions);

  3. (3)

    The RHC control has ability to deal with control input constraints and system state constraints, such as the flight state constraints of UAVs.

The selection of the RHC control parameters is a very tough problem in the formation control, on account of the existence of strong coupling among the inputs, and the nonexistence of mapping relationship between the performance index and the controller parameters [15]. Some researchers try to use intelligent algorithms to solve this problem, for instance, particle swarm optimization (PSO) [16], artificial bee colony (ABC) algorithm, differential evolution (DE) [17], and so on. Unfortunately, as the complexity of optimization problem, these intelligent algorithms become slightly prohibitive under the block on local optimum and dissatisfactory convergence rate.

A novel brainstorm optimization algorithm named brain storm optimization (BSO) was first introduced by Shi in [18] in recent years, which was inspired by the human brain storming process other than the optimization algorithms inspired by collective behavior of insects like ants, bee, etc. BSO generally uses the grouping, replacing, and creating operators to produce ideas as many as possible to approach the problem global optimum generation by generation [19].

In this paper, owning to its better performance of global exploration than others, the thoughts of BSO are applied to the control field to optimize the RHC control parameters in UAV formation control problem, to minimize the value of the cost function. Some modified designs taking aim at the problem of local optimum and slow convergence rate are proposed to enhance the conventional BSO performance.

The rest of the paper is organized as follows. Section 2 proposes the formation control scheme based on RHC, covering the selection of UAV model, the design of the cost function, and the overview of RHC formation controller. Section 3 focuses on three aspects of the specific improvement measures in the modified BSO. Simulation validations, together with comparison against the basic BSO and PSO, are presented in Sect. 4, and our concluding remarks are drawn in Sect. 5.

2 UAV formation control problem based on RHC

2.1 Principle of RHC

RHC, which has the advantage of online processing constraints on control input and output, describes the control problem as a constrained optimization problem of finite time [13]. In a sampling time, by seeking to minimize the estimated objective over a fixed time interval, subject to the estimated dynamics and constraints, the optimal control input sequence in estimated time domain can be obtained; the first item in the sequence is chosen as the RHC input, while the rest items discarded;

then, the system states change under the action of the control inputs; at the next time step, the process is repeated, with updated estimates of the current state and future quantities; these repeated, and the system recording horizon control is implemented. General architecture of RHC is shown in Fig. 1a.

Fig. 1
figure 1

Process of RHC

2.2 UAV model

2.2.1 Kinematic and dynamic models

In this paper, point-mass aircraft model [20] is used to describe the motion of formation flying UAVs. The related variables are defined with respect to the inertial coordinate frame \((\hat{{x}},\hat{{y}},\hat{{h}})\) and are shown in Fig. 2.

Fig. 2
figure 2

UAV model

In what follows, the model assumes that the aircraft thrust is directed along the velocity vector, and that the aircraft always performs coordinated maneuvers. It is also assumed that the Earth is flat, and the fuel expenditure is negligible (i.e., the center of mass is time-invariant). Under these assumptions, the UAV equations of motion can be described by the following equations:

$$\begin{aligned} \begin{array}{l} \dot{x}_i =V_{gi} \cos \gamma _i \cos \chi _i \\ \dot{y}_i =V_{gi} \cos \gamma _i \sin \chi _i \\ \dot{h}_i =V_{gi} \sin \gamma _i \\ \end{array} \end{aligned}$$
(1)

where \(i~=~1, \ldots , n\) is the index of multiple UAVs and \(n\) is the number of UAVs. For UAV \(i,\,x_{i}\) is the down-range displacement, \(y_{i}\) is the cross-range displacement, \(h_{i}\) is the altitude, \(V_{gi}\) is the ground speed, \({\gamma }_{i}\) is the flight-path angle, and \({\chi }_{i}\) is the heading angle. The UAV dynamics are given by the following equations:

$$\begin{aligned} \begin{array}{l} \dot{V}_{gi} =\frac{T_{hi} -D_{gi} }{m_i }-g_a \sin \gamma _i \\ \dot{\gamma }_i =\frac{g_a }{V_{gi} }(n_{gi} \cos \phi _{bi} -\cos \gamma _i ) \\ \dot{\chi }_i =\frac{L_{fi} \sin \phi _{bi} }{m_i V_{gi} \cos \gamma _i } \\ \end{array} \end{aligned}$$
(2)

where \(T_{hi}\) is the engine thrust,\(D_{gi}\) is the drag, \(\hbox {m}_i \) is the mass,\(g_{a}\) is the acceleration due to gravity, \(L_{fi}\) is the vehicle lift, and \({\phi }_{bi}\) is the banking angle. The control variables in the UAVs are the g-load \(n_{gi} = {L_{fi} }/{(g_a m_i )}\) controlled by the elevator, the banking angle \({\phi }_{bi}\) controlled by the combination of rudder and ailerons, and the engine thrust \(T_{hi}\) controlled by the throttle. Throughout the formation control process, the control variables will be constrained to remain within their respective limits.

The reduced UAV models can be recast in a matrix form:

$$\begin{aligned} \dot{X}=AX+BU \end{aligned}$$
(3)

with \(X~=~[x_{1}, \ldots , x_{n}],U~=~[u_{1}, \ldots , u_{n}],u_{i} \,=\, \) \([T_{hi}, n_{gi}, { \phi }_{bi}]^{T},x_{i}~=~[x_{i},~y_{i}, h_{i},\) \( V_{gi}, { \gamma }_{i}, { \chi }_{i}]^{T},i\!=\!1, \ldots , n,\hbox {where}X \in R^{6\times n}\) and \(U \in R^{3\times n}\) are, respectively, the aggregate states and aggregate control inputs of all UAVs.

2.2.2 Estimated model

Under the assumption of bounds on control variables and flight speed [13], the above-mentioned UAV model (3) can be also expressed as the following equations:

$$\begin{aligned} x_{k+1} =Ax_k +Bu_k \end{aligned}$$
(4)

Estimated value for the system states can be obtained by iterating Eq. (4):

$$\begin{aligned} x_{k+i\hbox {|}k}&= A^{i}x_{k\hbox {|k}} +\sum _{j=0}^{i-1} {A^{i-1-j}} Bu_{k+j\hbox {|k}} ,\nonumber \\ i&= 1,2,\ldots ,N \end{aligned}$$
(5)

where \(x_{k+i\hbox {|}k} \) is the estimated state over the time interval \(k+I \) at time \(k\). When \(i=0\), \(x_{k\hbox {|k}} =x_k \) is the current state estimate apparently. In order to facilitate the study, the estimated states and respective control inputs over the time interval \(k+1, \ldots , k+N\) at time \( k \) can be recast in a vector form:

$$\begin{aligned} \tilde{x}&= \left( x_{k+1\hbox {|k}} ,x_{k+2\hbox {|k}} ,\ldots ,x_{k+N\hbox {|k}}\right) ^{T},\nonumber \\ \tilde{u}&= \left( u_{k\hbox {|k}} ,u_{k+1\hbox {|k}} ,\ldots ,u_{k+N-1\hbox {|k}}\right) ^{T}. \end{aligned}$$

Thus, an expression only using current state \(x_{k}\) and control input \(u\) can be obtained.

$$\begin{aligned} \tilde{x}=H_x x_k +H_u \tilde{u} \end{aligned}$$
(6)

with \(H_{x}~=~\left( A, A^{2}, \ldots , A^{N}\right) ^{T}\) and

$$\begin{aligned} H_u =\left[ {{\begin{array}{llll} B&{} 0&{} \cdots &{} 0 \\ {AB}&{} B&{} \cdots &{} \vdots \\ \vdots &{} \vdots &{} &{} B \\ {A^{N-1}B}&{} {A^{N-2}B}&{} \cdots &{} {AB} \\ \end{array} }} \right] . \end{aligned}$$

2.3 Cost function design

The target of formation control is to maintain the relationship among unmanned machines in a fixed position shape, meanwhile to make the required control input smaller; therefore, the cost terms of system states and inputs should be included in the cost function [13]. The cost function needs to be designed in current time domain in order to coordinate RHC with optimal solution over a fixed time interval from current time. Choose 2-norm as the basic item of cost function; assume that the length of fixed time interval is \(N\), and then the cost function of follower RHC controllers is the following equation:

$$\begin{aligned} \hat{{J}}_{QP}&= \sum _{i=0}^{N-1} {((x_{k+i+1|k} } -x_{\mathrm{ref},k+i+1} )^{T}Q(x_{k+i+1|k}\nonumber \\&-x_{\mathrm{ref},k+i+1} )+(u_{k+i|k} -u_{\mathrm{ref},k+i} )^{T} \nonumber \\&R(u_{k+i|k}\!-\!u_{\mathrm{ref},k+i} )) \end{aligned}$$
(7)

where \(x_{\mathrm{ref},k+i+1}\) is the expected state of system,\(u_{\mathrm{ref},k+i}\) is expected control input, and \(u_{k+i|k}\) is the estimated control input in future time \(k+I \) at current time \(k\).

The weighting matrixes \(P\) and \(Q\) are positive matrix. That is to say that \(P~=~P^{T}~>\) 0 and \(Q~=~Q^{T} >\) 0. In order to get smaller control, set the desired control input to be zero. Substituting Eq. (6) in Eq. (7), we can obtain the following equations:

$$\begin{aligned} {\mathop {\mathop {{J}}\nolimits _{QP}}^{\frown }}&= \tilde{x}^{T}\tilde{Q}\tilde{x}+\tilde{u}^{T}\tilde{R}\tilde{u}-2\tilde{x}^{T}\tilde{Q}\tilde{x}_{\mathrm{ref}} +\tilde{x}_{\mathrm{ref}}^T \tilde{Q}\tilde{x}_{\mathrm{ref}}\nonumber \\&= \tilde{u}^{T}(H_u^T \tilde{Q}H_u +\tilde{R})\tilde{u}+2(H_x x_k -\tilde{x}_{\mathrm{ref}} )\tilde{Q}H_u \tilde{u}\nonumber \\&+x_k^T H_x^T \tilde{Q}H_x x_k \!-\!2x_k^T H_x^T \tilde{Q}\tilde{x}_{\mathrm{ref}} \!+\!\tilde{x}_{\mathrm{ref}}^T \tilde{Q}\tilde{x}_{\mathrm{ref}}\nonumber \\ \end{aligned}$$
(8)

where \(\tilde{Q}=\mathrm{diag}\left\{ {Q,\ldots ,Q} \right\} ,\tilde{R}=\mathrm{diag}\left\{ {R,\ldots ,R} \right\} \) and \(\tilde{x}_{\mathrm{ref}} =(x_{\mathrm{ref},k+1} ,x_{\mathrm{ref},k+2} ,\ldots ,x_{\mathrm{ref},k+N} )^{T}\).

The first item of the Eq. (8) is the quadratic term of \(\tilde{u}\), the second item is the monomial term of \(\tilde{u}\), and the last three items are the absolute term. The absolute terms can be ignored when making optimal solution. Then, we can get the standard quadric form:

$$\begin{aligned} J_{QP}&= \tilde{u}^{T}(H_u^T \tilde{Q}H_u +\tilde{R})\tilde{u}\nonumber \\&+2(H_x x_k -\tilde{x}_{\mathrm{ref}} )\tilde{Q}H_u \tilde{u}. \end{aligned}$$
(9)

From Eq. (9), the cost function \(J_{QP}\) is the function of the initial state \(x_{k}\) and the control input sequence \(\tilde{u}\) which is the decision variable. For every certain initial state \(x_{k}\), by solving the quadratic programming problem, we may get a group of optimal control input sequence.

2.4 RHC formation controller

The elementary process of RHC formation controller is presented in the following exposition [13]:

Step 1 :

At sampling time \(k\), the follower state is \(x_{0}\). To solve the optimization problem described in Eq. (9), obtain the optimal control input sequence in future \(N\) steps.

Step 2 :

The first item in the sequence is chosen as the follower RHC inputs, while the rest \(N\)-1 items discarded.

Step 3 :

The system of follower reaches a new state \(x_{1}\) at sampling time \(k+1\) under the action of the control inputs.

Step 4 :

Regard current time and follower state as \(k\) and \(x_{0}\) respective. Return Step 1.

For each sampling time, the optimal decision variable of the optimal problems \(\tilde{u}\) is only concerned with the initial state \(x_{k}\), i.e., there is a state feedback control law \(\tilde{u}^{*}=f(x_k )\), but there is no analysis relationship between the system state and the control input. The RHC formation controller process can be illustrated by Fig. 1b.

3 Modified BSO

3.1 Basic BSO

BSO has been successfully applied to generate ideas to solve very difficult and challenging problems [18]. As presented in [18] and [21], the process of BSO can be described as follows [22]. First, \(N\) ideas are randomly generated within the searching space, denoted as \(X_{i}~=~[x_{i1}, x_{i2}, \ldots , x_{iD}]\), where \(i~=~1, 2, \ldots , N\), and \(D\) is the dimension of the optimization problem to be solved. Each dimension signifies one design variable. Then, each idea is evaluated, and its fitness value \(J(X_{i})\) is obtained. The process of iteration begins afterwards.

During each generation, the \(N \) ideas are clustered into \( K \) cluster according to the positions, and the best idea in each cluster is recorded as the cluster. Then, a cluster is randomly selected with a probability of \(p_{6a}\), and the cluster center is replaced with a randomly generated idea.

In the creating operation, BSO first randomly choose one cluster or two. After that, the selected idea(s) is updated according to the following equation:

$$\begin{aligned} x_\mathrm{new}&= x_\mathrm{old} +\xi N(\mu ,\sigma ) \nonumber \\ x_\mathrm{old}&= \left\{ {{\begin{array}{lll} x_{ij} ,\qquad \qquad \qquad \mathrm{one}\quad \mathrm{cluster} \\ {\omega _1 x_{i1,j} +\omega _2 x_{i2,j} ,} \quad \mathrm{two} \quad \mathrm{cluster} \end{array} }} \right. \end{aligned}$$
(10)

where \(N({\mu }, {\sigma })\) is the Gaussian random value with mean \({\mu }\) and variance \({\sigma }.\, {\omega }_{1}\) and \({\omega }_{2}\) are weight values of the two ideas. \(\xi \) is an adjusting factor slowing the convergence speed down as the evolution goes, which can be expressed as the following equation:

$$\begin{aligned} \xi =\log \mathrm{sig}\left( \frac{0.5\times Nc_{\max } -Nc}{K}\right) \times \mathrm{random}(0,1),\nonumber \\ \end{aligned}$$
(11)

where \(r\) is a random value between 0 and \(1. Nc_\mathrm{max}\) and Nc denote the maximum number of iteration and current number of iteration, respectively, whereas \( K \) adjusts the slope of the log sig function. Such form of \(\xi \) facilitates global searching ability at the beginning of the evolution and enhances local searching ability when the process is approaching to the end.

After the new idea is created, a crossover between the new one and the old one is conducted. The two ideas generated by crossover, together with the old one and the created one, are evaluated, and the old one is replaced with the best of the four.

The process above repeats until \(N\) ideas are updated. Thus, one generation is finished. The iteration goes until terminal requirement is met. Then, the best idea is output as the optimal solution to the problem.

3.2 Modified BSO

Taking a deep sight into the process of BSO described above, several improvements deserve to be taken into consideration.

3.2.1 New clustering method

First, the clustering method can be improved by sorting fitness values. In basic BSO, during each generation, the \(N\) ideas are clustered into \( K\) cluster completely according to the positions. Two clustering methods have been proposed, which are k-means clustering method and SGM [19]. There is no denying that these methods give expression to prodigious randomness thus give security to diversity. However, it makes no difference in generating ideas. Hence, a brand-new clustering standard can be tried. In this paper, fitness value is taken into account as a new clustering standard. In the Modified BSO, the facilitator is implemented by fitness value clustering method as the following steps:

Step 1 :

Randomly produce \(K\) random positive integers with a constraint that the sum of these integers is constant \(N\) .These \(K\) random positive integers are denoted as the number of individuals in each cluster \({nr}_{i}(1\le \, i\, \le K)\).

Step 2 :

Calculate the fitness value \(J(X_{i})\) for each idea \(X_{i}(1\le \,i\,\le \,N)\) and sort values in ascending or descending order (It depends on the expected fitness value, maximum or minimum) to obtain \(J_\mathrm{order}\) and Ind \(_\mathrm{order}\), where \(J_\mathrm{order} \in R\hbox {1}\times N,\,\) is the form of \(J(X_{i})\) in ascending or descending order, and \( Ind_\mathrm{order} \in R\hbox {1}\times N,\) is the sequence of original index for each idea \(X_{i}\) when \(X_{i}\) is sorted by \(J(X_{i})\) in order.

Step 3 :

Assign the idea \(X_{Ind_\mathrm{order} (m)} \) into the \(i\) group, where \(m=\left( \sum \limits _{j=1}^{i-1}{nr}_{j}+1\right) , \ldots , \left( \sum \limits _{j=1}^{i}{ nr}_{j}\right) ,i~=~1, \ldots , K\), and set nr \(_{0}\) = 0. Meanwhile, choose the idea\( ~X_{Ind_\mathrm{order} \left( \sum \limits _{j=1}^{i-1} {nr_j } +1\right) } \) as the center of \(i\) group.

Generally speaking, above-mentioned clustering method is to classify ideas by fitness value in essence. In the actual situation, this clustering method has relevance to the behavior of human beings. People with the same thoughts level easily reach a consensus to gather together. And in their respective groups, the owner of better idea has more possibility to become the clustering center.

It is worthwhile to specify that this method does not show any contrariness to Osborn’s original rules for idea generation in a Brainstorming process. In other words, the diversity of basic algorithm is not destroyed. Specifically, it is just to provide some convenience for later generating steps.

3.3 Local optimal solution

In the process of simulation, BSO is trapped in local optimal problems, the same as PSO and ABC. In this paper, two kinds of solutions to this problem are proposed.

The first method is based on the principle of probability updating. In basic BSO algorithm process, the two ideas generated by cross, together with the old one and the created one, are evaluated and the old one is replaced with the best of the four [22]. For the following exposition, we assume that the smaller value of the cost function is better. Probability updating is to replace the old one with the best one in a probability [23]:

$$\begin{aligned} T(X_i \rightarrow X_j )=\frac{1}{1+\exp [(\omega _j -\omega _i )/\kappa ]} \end{aligned}$$
(12)

where \(\kappa \) characterizes the intensity of updating, \({\omega }_{i}\) represents the cost value of the old one, and \({ \omega }_{j}\) represents the cost value of best one. For \({ \kappa }~=~0,\,X_{i}\) will be replaced with \(X_{j}\) deterministically when \({\omega }_{j}~<~{ \omega }_{i}\). For \({\kappa }~>~0\), ideas performing worse are also remained with a certain probability. In most cases, better ideas prefer to replace the worse one. In line with most previous studies, we set \(\kappa \) to be 0.1 (strong updating).

Specific replacement formula is as follows:

$$\begin{aligned} X_i =\left\{ {{\begin{array}{ll} X_j\qquad \mathrm{rand}\le T \\ X_i\qquad \, \mathrm{rand}>T \\ \end{array} }} \right. . \end{aligned}$$
(13)

This method fully reflects inclusive of group and appropriate humanistic cares to worse ideas; therefore, it meets the need of the required diversity of algorithm.

In addition, slightly traditional method is chaos. Chaos is the highly unstable motion of deterministic systems in finite phase space which often exists in nonlinear systems [24]. In the well-known logistic equation,

$$\begin{aligned} x_{n+1} =4x_n (1-x_n ) \end{aligned}$$
(14)

where 0 \(<\,x_{n}\,<\) 1, a very small difference in the initial value of \(x \) would give rise to large difference in its long-time behavior, which is the basic characteristic of chaos. The track of chaotic variable can travel ergodically over the whole space of interest. The variation of the chaotic variable has a delicate inherent rule in spite of the fact that its variation looks like in disorder. Therefore, after each search round, we can conduct the chaotic search in the neighborhood of the current optimal parameters by listing a certain number of new generated parameters through chaotic process. In this way, we can make use of the ergodicity and irregularity of the chaotic variable to help the algorithm to jump out of the local optimum as well as finding the optimal parameters. The specific procedure is as follows:

Step 1 :

Conduct the chaotic search around the best idea \(X_{i}\) parameters based on Eq. (13) after transforming the parameters ranges into (0, 1).

Step 2 :

Among the engendered series of solutions, select the best one and use it to replace the former best ideas.

3.3.1 Learning behavior

Gazing at the entire idea of BSO algorithm, we will notice that the algorithm tries to pursue diversity while it loses sight of the purpose to find the optimal solution as soon as possible.

During each generation, lots of ideas can and should be based on ideas already generated [18]. Any generated idea can and should serve as a clue to generate more ideas. Picking up several good ideas from ideas generated so far is to cause the brainstorming group to pay more attention to the better ideas which the brainstorming group to pay more attention to the better ideas which the brainstorming group believes to be. The ideas picked up work like point attractors for the idea generation process. In the specific process of the algorithm, generally, cluster centers are selected with higher probability than other ideas in the creating and updating section; therefore, the cluster centers are treated with higher priority [22].

In a nutshell, the cluster centers with better ideas just cause enough attention but not play guiding roles. In a group, the center not only deserves to be paid closer attention to, but also is duty bound to play the role of appropriately guiding other individuals. Namely, other individuals should learn to the center of the group. Taking example by PSO, the individual updates its ideas according to the following equations [25]:

$$\begin{aligned} \begin{array}{l} X_i^\mathrm{new}\!=\!X_i^\mathrm{old} +V_i \\ V_i \!=\!\omega V_i\!+\!c_1 r_1 \left( P_{id}\!-\!X_i^\mathrm{old} \right) \!+\!c_2 r_2 \left( P_{gd}\!-\!X_i^\mathrm{old} \right) \\ \end{array} \end{aligned}$$
(15)

where \(P_{id}\) is the idea of the \(m\) group center, \(m\) is the group with \(i,P_{gd}\) is the best idea among \( K \) centers at present, \(\omega \) is the inertia weight, \(c_{1}\) and \(c_{2}\) are positive constants, and \(r_{1}\) and \(r_{2}\) are two random numbers in the range [0,1], 1\(\le \, i\,\le \,N_{c}.\) In this paper, we define \(\omega \) = 0.5, \(c_1 =c_2 =2\) , and we restrict \(X_{i}\) in the range [0, 1]\(^{D}\) and \(V_{i}\) in the range [0, 0.1]\(^{D}\).

The modified BSO process can be illustrated by Fig. 3.

Fig. 3
figure 3

Modified BSO process

4 Comparative experimental results

In a typical multiple UAV formation flight, the followers follow the trajectory of the leader UAV, taking other aircrafts as reference to keep its own position inside the formation [16]. In a large formation, intra-aircraft distances must be kept constant. The formation model in this paper adopts leader-mode strategy (as shown in Fig. 4), which means each follower UAV takes its trajectory references from the leader UAV, while the altitude is the same for all. The leader UAV takes charge of formation trajectory.

Fig. 4
figure 4

Multiple UAV formation

To test the effectiveness of the modified algorithm, the optimization for a model of UAV formation is chosen as the benchmark [22]. 30 RHC circles were conducted in the process of each simulation. In a single RHC circle, algorithm optimization loops 100 times. As shown in Table 1, this benchmark problem has three design variables and one cost function value. Our object is to minimize the cost function value indirectly determined by these design variables with the constraints shown in Table 2 at the initial states shown in Table 3. The performances of the modified algorithm are compared with the basic BSO and a popular population-based algorithm called PSO. The control parameters of BSO and the Modified BSO are given in Table 4. The control parameters of PSO modeled after [25] are given in Table 5.

Table 1 Optimization parameters
Table 2 Constraints
Table 3 Initial states of UAVs
Table 4 Control parameters of BSO and the modified BSO
Table 5 Control parameters of PSO

In order to clearly illustrate the performance of the modified BSO in single RHC process, the evolution curve after 100 generations in the first round of the RHC simulation cycle compared with basic BSO and PSO is displayed in Fig. 8a. As we shall see, due to probability updating and chaos optimization, the modified BSO exhibits better capability to jump out of local optimum [22]. When the individuals converge to a local optimum, indicated by the horizontal retention parts of the evolution curve, the modified BSO takes much less generations to jump out of the local optimum and find a position with smaller cost function value.

From the below location of the modified BSO always, it also gives a satisfactory performance on the final optimization result in single RHC simulation process. It is obvious that the learning behavior toward to their respective cluster center and the best cluster center is crucially involved in speeding up to seek the optimal solution within the global scope.

The final results after 30 RHC simulations are presented in Tables 69. Without loss of generality, in the simulation, we set the leader to level off at an even speed in a straight line. From data in the table, we can conclude that either the modified BSO or basic BSO or PSO can complete the optimization goal to form a comparative stable flight formation. The down-range displacement \(x_{i}\), the cross-range displacement \(y_{i}\), the altitude \(h_{i}\), the ground speed \(V_{gi}\), the flight-path angle \({\gamma }_{i}\), and the heading angle \({\chi }_{i}\) of four followers are in line with reference values relatively.

Table 6 Simulation results of leader
Table 7 Simulation results of \(J\)
Table 8 Simulation results of \(x_{i},\, y_{i}\) and \(h_{i}\) by different methods
Table 9 Simulation results of \(V_{gi},{\gamma }_{i}\) and \({\chi }_{i}\) by different methods

Figures 57 show the detailed results generated by the control sequence optimized by three algorithms, respectively, in which (a)–(d) describe three-dimensional formation trajectories, relative velocities, heading angles, and flight-path angles of UAVs.

Fig. 5
figure 5

Detailed results generated by the control sequence optimized by the modified BSO

Fig. 6
figure 6

Detailed results generated by the control sequence optimized by BSO

Fig. 7
figure 7

Detailed results generated by the control sequence optimized by PSO

As shown in (c) of Figs. 57, each RHC control parameters solved by three algorithms can make the relative heading angles \({ \chi }_{i}\) of followers tend to coincide with leader at about 1.5 s with minor fluctuations. However, the performance of three algorithms on velocities \(V_{gi}\) and flight-path angles \({ \gamma }_{i}\) is quite different. Observed from Fig. 5b, d, in the presence of the modified BSO, the relative velocities and flight-path angles of UAVs are close to accordance at about 2 and 1.5 s, respectively. The performance of basic BSO is inferior to the modified algorithm. The velocities of followers in Fig. 6b fluctuate irregularly within 5 m/s amplitude, and the flight-path angles of followers in Fig. 6d can track the sates of leader in stable at 2 s, 0.5 s behind the modified BSO. PSO, by contrast, is defeated due to fluctuation shown in Fig. 7b, d with larger amplitude and higher frequency.

From the three-dimensional formation trajectories of UAVs in (a) of Figs. 57, the aggregate performance of three algorithms can be observed directly. It is evident that the follower 2 in (a) of Figs. 6 and 7 selected a more awkward path before converging to the designated position, while at the help of the modified algorithm, it can find a relatively better way shown in Fig. 5a to reach a desired position and keep formation. The formation trajectories in Fig. 5a have better stability, faster convergence rate, and better tracking ability, indicated by more smooth curve and longer stage of keeping formation, than the other two.

According to the above analysis, the modified BSO’s optimization purposes to form stable formation has a basic implementation. In order to see the difference among three algorithms clearly and straightforward, the cost function value evolution curves after 30 times of RHC simulations (as shown in Fig. 8b, c) is presented to compare the performance measured by numerical values. From Fig. 8b, the evolution curve of the modified BSO is always below the other two algorithms. In other words, the modified algorithm can find better control parameters in each RHC simulation cycle; therefore, the result of last RHC simulation cycle will provide a good starting point for the next cycle. In Fig. 8c, the logarithmic axes are chosen as the vertical axis to facilitate the analysis of formation stability and tracking ability. The evolution curve of the modified algorithm converges to a smallest steady value with the fastest speed than the others, which means that UAVs get in a relatively stable formation at a rapid speed, and a better consistency exists in UAVs group.

Fig. 8
figure 8

Evolution curve

To sum up, given credit to the aforementioned improvement measures, the modified BSO does embody certain superiority in jumping out of local optimum and speeding up the optimization process in the set off of basic BSO and PSO.

5 Conclusions

This paper presents a modified BSO approach for formation flight optimization problem based on the nonlinear RHC mode of UAVs. The proposed modified algorithm enhances the basic BSO performance in jumping out of local optimum and speeding up the rate of convergence from the following aspects of improvement:

  1. (1)

    A new clustering method by sorting cost function value;

  2. (2)

    Probability updating which means to replace the old one with the best one in a probability;

  3. (3)

    Conduct chaotic search around the best one utilizing the ergodicity and irregularity of the chaotic variables;

  4. (4)

    Learning behavior toward to their respective cluster center and the best cluster center.

From the comparative results in simulation, we can conclude that either the modified BSO or basic BSO or PSO can complete the optimization goal to form a comparative stable flight formation. It also demonstrates that the modified BSO has certain superiority in jumping out of local optimum and speeding up the optimization process. Our future work will focus on a novel bio-inspired algorithm named Pigeon-Inspired Optimization (PIO) [26] to provide new methods for multi-UAV control problem.