1 Introduction

Collective behavior is pervasive in nature and is frequently observed in diverse organisms ranging from microscopic bacteria (Allison and Hughes 1991) to large-scale flocking of birds and insects (Nagy et al. 2010; Czaczkes et al. 2015). While researchers still hypothesize the underlying mechanisms behind such behavior, moving in groups can offer several advantages like avoiding predators or carrying collective cargo (Ron et al. 2018). Emulating these natural systems has gained popularity in the past few years, and the development of a robust, fault-tolerant and generalized swarm of robots is now a widely regarded problem among researchers (Saffre et al. 2021; Coppola et al. 2020). Aerial swarms, owing to their high maneuverability and speed, find a number of applications in various industries. They can be deployed as counter-drone measures (Brust et al. 2017) or for basic surveillance operations in a defense scenario. In Abraham et al. (2019), the authors utilize the sensing capability of multiple robots to yield topographical and population density maps of a disaster-afflicted area. In Tosato et al. (2019), a centralized swarm architecture was proposed for measuring air pollution. A system like this could reduce measurement error due to the bigger sample size and distributed data points over the coverage volume. Mixed aerial and ground swarms have also been used to automate construction tasks (Krizmancic et al. 2020). In Ju and Son (2018), multiple UAVs have been shown to outperform a single UAV for tasks like agricultural sensing and monitoring by measuring multiple metrics like energy consumption, flight time, and area coverage. In general, drone swarms can be classified into three categories in order of increasing complexity (Kumar 2020):

  • Coordinated: This refers to the collective movement with basic environmental awareness and collision avoidance.

  • Cooperative: Here, the robots work together to achieve a particular goal using fewer resources than a single drone.

  • Collaborative: This refers to multiple drones working together irrespective of their nature, i.e., heterogeneous collaboration.

This paper proposes a methodology to solve the drone swarm coordination problem with multiple conflicting objectives. This document uses the terms drone, UAV, and aerial robot interchangeably.

Developing a robust velocity controller that allows multiple drones to self-organize comes with its challenges. According to the taxonomy defined in Coppola et al. (2020), the control of velocities is one of the methods under classical Swarming Behavior, i.e., deciding on a high-level control policy with shared information across each agent’s neighbors. The challenge is to take this shared information (the agent’s state as well as states of neighbors) and come up with controllers (functions) that output an instantaneous velocity vector for each agent. Over time, each agent’s velocity gives rise to various patterns and mutual interactions that can potentially emerge into self-organizing behavior. Conventionally, the first-of-its-kind algorithm by Reynolds was based on simple rules for each agent and has been successfully applied in many fields (Hauert et al. 2011; Dewi et al. 2011; Moere 2004). In Vásárhelyi et al. (2018), the authors address this problem by defining a single fitness function and optimizing it through the CMA-ES algorithm. However, the study does not consider multiple conflicting objectives, the priority of which can vary depending on the scenario. A comprehensive study in Fine and Shell (2013) formalizes flocking behaviors and unifies literature by presenting a data-flow template for various flocking stages. In Márquez-Vega et al. (2021), a multi-objective solution for quad-rotors is proposed. However, the swarm size is limited, and the full range of solutions considering the relations among the fitnesses is not explored. We explore these relationships using unsupervised learning and extend our findings to highlight the use of obtaining a non-dominating solution set for drone flocking.

Modeling natural processes through simulation often needs to be complemented by an in-depth qualitative understanding of the performance measures. Unsupervised learning can help understand and cluster data, especially in high-dimensional spaces that cannot be visualized. It is widely used in experiments where abundant data is available such as mapping vulnerability indices (Abson et al. 2012), understanding relationships between economic and environmental objectives in a chemical supply chain (Pozo et al. 2012), understanding global motions of atoms in proteins (Loeffler and Kitao 2009), and most commonly for dimensionality reduction in evolutionary algorithms (Deb and Saxena 2006). Like many natural systems, the solution to an optimization problem depends on various factors. Often, these factors or objectives are conflicting and they cannot be solved simultaneously without compromising the overall fitness. In the case of flocking, we consider six objectives from Vásárhelyi et al. (2018) (referred to by the Vásárhelyi et al. model from here onwards):

  • Collision avoidance with the wall.

  • Collision avoidance with each other.

  • Average speed of the swarm.

  • Average velocity alignment or correlation.

  • Total number of connected agents.

  • Total number of disconnected agents.

We use PCA to understand the collective dynamics of multi-agent systems and therefore reduce the multi-objective optimizer’s objective space. To the best of our knowledge, this work is the first attempt that involves using PCA to reduce the objective functions for a drone flocking optimization problem.

These objectives are then optimized via a well-established multi-objective optimizer (NSGA-II) to yield a Pareto front that can guide decision-making and trade-offs under various situations. We report the results and show that the results at the extremities of the Pareto front perform better than that of the CMA-ES algorithm. We conclude by giving some practical examples of such abstract mathematical formalism for real-time decision-making with a flock of UAVs.

In short, in this research, we create a drone swarm simulator integrated with a multi-objective solver, use PCA to understand the collective dynamics of swarms, and give a Pareto front that represents different swarms that can be used in real-world scenarios. The rest of the paper is organized as follows: Sect. 2 presents the background of Principal Component Analysis (PCA) and multi-objective optimizer (NSGA-II). A drone flocking optimization problem is formulated in Sect. 3. In Sect. 4, PCA is used to reduce the number of the objective functions and a discussion on the correlations is followed. Section 5 presents the experimental setup and the numerical results and discussions are given in Sect. 6. The research is concluded by giving some potential use cases and possible future work.

2 Background

2.1 Principal component analysis

A high-dimensional objective space suffers from poor selection pressure and convergence (Deb and Saxena 2006). It is also challenging to visualize the space and gain intuition which is often required for appropriate decision-making. Principal component analysis, a technique under the domain of unsupervised learning, may help understand the underlying structure of the data without explicit labels. The idea is to search for the eigenvectors of an m-dimensional covariance matrix (K) which is then used to decide the redundant objectives. Here, m is the number of objectives. This covariance matrix is symmetric, and its elements give the relations between the design variables on which the analysis has been run. Such an analysis of the objectives of an optimization problem can provide insights into their correlations and can help in understanding their qualitative aspects. The process to determine K is given in Appendix A.

2.2 Non-dominating sorting genetic algorithm-II

NSGA-II is a multi-objective optimization algorithm based on ranking each solution in the population according to their fitness and progressively producing better solutions using genetic operators like reproduction and mutation. We use NSGA-II in our work to trade off the reduced fitnesses with each other. This trade-off is represented by Pareto Fronts, which are made up of non-dominated solutions within an evolutionary population. The entire algorithm is explained in detail in Deb et al. (2002). However, a brief explanation covering the salient features of NSGA-II is also explained in Appendix A.

3 Drone flocking optimization problem

A completely decentralized flocking swarm is based on simple rules like Separation, Alignment, and Cohesion. When defined using a velocity control algorithm, these rules have specific parameters that can be tuned to flock optimally. In this section, these parameters are introduced, and a simulation framework capable of handling artificial sensor noise is created. The algorithm used for flocking is based on the Vásárhelyi et al. model and Reynold’s Flocking model (Reynolds 1987). Some subtle modifications have been incorporated to handle a multi-objective optimization framework. We use vectorized versions of the equations to leverage fast computation with matrix computation libraries.

To simulate a multi-agent system, there must be a mechanism to share information across the agents. In the case of a decentralized system, this information is shared in each agent’s neighborhood \(\mathcal {N}_{o}\). Moreover, real systems are characterized by stochastic uncertainty and noise, which are incorporated into the position(\({\textbf {r}}\)) and velocity(\({\textbf {v}}\)) vectors of the drones. The model for simulating the noise and environmental effects is taken from Virágh et al. (2014). The relative position (\({\textbf {r}}_{ji}\)) and velocity (\({\textbf {v}}_{ji}\)) at time t is then found using the following equations:

$$\begin{aligned} {\textbf {r}}_{ji}(t)&=({\textbf {r}}_{j}(t-t_{\mathrm {del}}^{\mathrm {comm}})+{\textbf {r}}_{j}^{gps})-{\textbf {r}}_{i}(t)-{\textbf {r}}_{i}^{gps} \end{aligned}$$
(1)
$$\begin{aligned} {\textbf {v}}_{ji}(t)&=({\textbf {v}}_{j}(t-t_{\mathrm {del}}^{\mathrm {comm}})+{\textbf {v}}_{j}^{gps})-{\textbf {v}}_{i}(t)-{\textbf {v}}_{i}^{gps} \end{aligned}$$
(2)
$$\begin{aligned} R^{\mathrm {rel}}_{j}&= {\textbf {r}}_{ji}(t) \end{aligned}$$
(3)
$$\begin{aligned} V^{\mathrm {rel}}_{j}&= {\textbf {v}}_{ji}(t) \end{aligned}$$
(4)

where, \({\textbf {r}}_{ji}\ \equiv \) Relative position vector of \(j\hbox {th}\) agent with respect to \(i\hbox {th}\) agent at time t, \({\textbf {v}}_{ji}\ \equiv \) Relative velocity vector of \(j\hbox {th}\) agent with respect to \(i\hbox {th}\) agent at time t, \(R^{\mathrm {rel}}_{j} \ \equiv \) \(j\hbox {th}\) row of the Relative position matrix for agent \(i \ \forall \ j = {1,2,...,\mathcal {N}_{o}}\), \(V^{\mathrm {rel}}_{j} \ \equiv \) \(j\hbox {th}\) row of the Relative velocity matrix for agent \(i \ \forall \ j = {1,2,...,\mathcal {N}_{o}}\), \(t_{\mathrm {del}}^{\mathrm {comm}} \ \equiv \) Simulated communication delay, \({\textbf {r}}^{gps} \ \equiv \) Simulated GPS noise for position, \({\textbf {v}}^{gps} \ \equiv \) Simulated GPS noise for velocity

3.1 Decision variables

The flocking rules are explained in the following sections based on the above modification for the relative position and velocities. These rules give rise to certain parameters which are used as decision variables for the drone flocking optimization problem. Note that all the flocking operations are developed for the \(i\hbox {th}\) agent and are carried out for all N agents. All operations involving vectors/matrices and scalars are performed element-wise unless mentioned otherwise.

3.1.1 Separation

To flock effectively without collisions, the agents must have a mechanism for repulsion. A spring-like mechanism is used, which is activated at short ranges of inter-agent distance in the flock. The model for the same is taken from Vásárhelyi et al. (2018) and is delineated in Appendix B.

3.1.2 Alignment

Vásárhelyi et al. Vásárhelyi et al. (2018) realized that effective control of both the magnitude and direction of velocities as a function of inter-agent distances could yield the best alignment with scalable velocities. We use the same model in our work as well. The equations for alignment are explained in Appendix B for the sake of completeness.

3.1.3 Wall collisions

To account for collisions at walls, the Vásárhelyi et al. model proposes virtual “shill” agents at the walls which the actual agents can try to align their velocities with. These shill agents have no gain, so repulsion at walls takes place to the maximum extent (\(c^{\mathrm {shill}}=1\)). This makes sense while flocking in confined environments because one of the primary goals is to avoid the wall at any cost. In our research, however, while seeking a non-dominated set of solutions (ref. section 6), we characterize the elasticity of the virtual geo-fence using a shill gain (\(c^{\mathrm {shill}}\)) parameter. The following equations are used to find a shill velocity vector from each wall to align with it. \({\textbf {r}}_{ci}\) is the relative position vector from the agent to the arena’s center \({\textbf {r}}_{c}\). This vector is used to find the distances to the walls in Eq. (6). Eq. (8) gives a \(m \ \text {x} \ m\) sized matrix, with rows as the shill vector from each wall. We assume a square geo-fence with the center of the square at (0,0) in our research, but trivial modifications to Eq. (5) and (6) can generalize it to other shapes as well.

$$\begin{aligned} {\textbf {r}}_{ci}&= {\textbf {r}}_{c} -{\textbf {r}}_{i} \end{aligned}$$
(5)
$$\begin{aligned} {\textbf {r}}_{s}^{\mathrm {mag}}&= L_{c}/2 - \vert {\textbf {r}}_{ci} \vert \end{aligned}$$
(6)
$$\begin{aligned} {\textbf {v}}^{shillmax}_{i}&= D({\textbf {r}}^{\mathrm {mag}}_{s}- r_{0}^{\mathrm {shill}}, a^{\mathrm {shill}}, p^{\mathrm {shill}}) \end{aligned}$$
(7)
$$\begin{aligned} V_{s}&= (v^{\mathrm {shill}} . \frac{{\textbf {r}}_{ci}}{\vert {\textbf {r}}_{ci} \vert }) \odot I \end{aligned}$$
(8)
$$\begin{aligned} {\textbf {v}}_{s}^{\mathrm {mag}}&= \Vert V_{s} - {\textbf {v}}_{i} \Vert \ _{\top {\textbf {v}}^{shillmax}_{i}} \end{aligned}$$
(9)
$$\begin{aligned} V^{\mathrm {shill}}&= c^{\mathrm {shill}}.({\textbf {v}}_{s}^{\mathrm {mag}}-{\textbf {v}}^{shillmax}_{i}).\frac{V_{s}}{{\textbf {v}}_{s}^{\mathrm {mag}}} \end{aligned}$$
(10)
$$\begin{aligned} {\textbf {v}}^{\mathrm {shill}}_{i}&= \sum _{k=1}^{m} V^{\mathrm {shill}}_{k} \end{aligned}$$
(11)
  • where,

  • \({\textbf {r}}_{c} \equiv \) Absolute position of the center of the arena

  • \(L_{c} \equiv \) Side length of the arena

  • \({\textbf {r}}_{ci} \equiv \) Relative position of the center with respect to the agent

  • \(p^{\mathrm {shill}} \equiv \) Slope for the linear part of the decay curve (user-dependent parameter)

  • \(a^{\mathrm {shill}} \equiv \) Acceleration for the non-linear part of the decay curve (user-dependent parameter)

  • \(c^{\mathrm {shill}} \equiv \) Overall Gain for shilling alignment (user-dependent parameter)

  • \(v^{\mathrm {shill}} \equiv \) Speed of shilling agents (user-dependent parameter)

  • \(r_{0}^{\mathrm {shill}} \ \equiv \) Alignment cutoff distance for maximum alignment (user-dependent parameter)

  • \(V^{\mathrm {shill}} \ \equiv \mathcal {N}_{o} \ \text {x} \ 2 \) sized matrix of scaled shilling velocities for all dimensions

  • \(V^{\mathrm {shill}}_{k} \ \equiv \) Alignment velocity of \(k\hbox {th}\) dimension i.e. \(k\hbox {th}\) row of \(V^{\mathrm {shill}}\)

  • \({\textbf {v}}^{\mathrm {shill}}_{i}\ \equiv \) Desired collective shilling vector for \(i\hbox {th}\) agent.

Here, I is the identity matrix, and \(\odot \) is the Hadamard product. Eqs. (9) - (11) have the same velocity alignment procedure done in section 1, but here it’s done for each wall’s shill velocity instead of each agent neighbor.

The above three velocities (3.1.1 - 3.1.3) along with the normalized flocking velocity are summed up and normalized again to give the desired velocity for the respective agent. This desired velocity is further used to update the current velocity and position sequentially using a first-order Euler integration method.

$$\begin{aligned} {\textbf {v}}_{i}^{desired}&= \frac{{\textbf {v}}_{i}}{\Vert {\textbf {v}}_{i} \Vert } v^{\mathrm {flock}} + {\textbf {v}}_{i}^{\mathrm {rep}} + {\textbf {v}}_{i}^{\mathrm {frict}} + {\textbf {v}}_{i}^{\mathrm {shill}} \end{aligned}$$
(12)
$$\begin{aligned} {\textbf {v}}_{i}^{desired}&\longleftarrow min\{v^{max}, \Vert {\textbf {v}}_{i}^{desired} \Vert \} \ \frac{{\textbf {v}}_{i}^{desired}}{\Vert {\textbf {v}}_{i}^{desired} \Vert } \end{aligned}$$
(13)

Finally, the set of resulting 12 parameters to optimize is:

$$\begin{aligned} x = \{ r_{0}^{\mathrm {sep}}, p^{\mathrm {rep}}, r_{0}^{\mathrm {frict}}, a^{\mathrm {frict}}, p^{\mathrm {frict}}, v^{\mathrm {frict}}, c^{\mathrm {frict}}, r_{0}^{\mathrm {shill}}, v^{\mathrm {shill}}, a^{\mathrm {shill}}, p^{\mathrm {shill}}, c^{\mathrm {shill}} \} \end{aligned}$$

3.2 Fitness functions

Order parameters are defined and passed through transfer functions to get the fitnesses and measure the performance of one simulation run.

$$\begin{aligned} \begin{aligned} \mathcal {F}^{\mathrm {speed}}&= {\textbf {F}}_{1}(\phi ^{\mathrm {vel}}, v^{\mathrm {flock}}, v^{tol}) \\ \mathcal {F}^{\mathrm {coll}}&= {\textbf {F}}_{3}(\phi ^{\mathrm {coll}}, a^{tol}) \\ \mathcal {F}^{\mathrm {wall}}&= {\textbf {F}}_{2}(\phi ^{\mathrm {wall}}, r^{tol}) \\ \mathcal {F}^{\mathrm {corr}}&= \Theta (\phi ^{\mathrm {corr}})\phi ^{\mathrm {corr}} \\ \mathcal {F}^{\mathrm {disc}}&= {\textbf {F}}_{3}(\phi ^{\mathrm {disc}}, N/5) \\ \mathcal {F}^{\mathrm {cluster}}&= {\textbf {F}}_{3}(\phi ^{\mathrm {cluster}}, N/5) \end{aligned} \end{aligned}$$
(14)

Here, the order parameters \(\phi ^{\mathrm {vel}}, \phi ^{\mathrm {coll}}, \phi ^{\mathrm {corr}}, \phi ^{\mathrm {wall}}\), and transfer functions \({\textbf {F}}_{1}\), \({\textbf {F}}_{2}\), \({\textbf {F}}_{3}\) are taken from Vásárhelyi et al. (2018) the Vásárhelyi et al. model, and \(\Theta \) is the Heaviside step function. Parameters \(r^{tol}, a^{tol}, \text {and } v^{tol}\) are explained in Section 5. Order parameters for disconnected agents (\(\phi ^{\mathrm {disc}}\)) and the minimum connected agents (\(\phi ^{\mathrm {cluster}}\)) are explained below. These parameters are calculated locally in \(r^{\mathrm {cluster}}\) sized clusters:

$$\begin{aligned} r^{\mathrm {cluster}} = r^{\mathrm {rep}} + r^{\mathrm {frict}} + \tilde{D}(v^{\mathrm {flock}}, a^{\mathrm {frict}}, p^{\mathrm {frict}}) \end{aligned}$$
(15)

\(\tilde{D}\) is the braking distance r for which D(r, a, p) = v for any agent.

3.2.1 Disconnected agents

This parameter measures the average number of completely disconnected agents throughout the simulation. Eq. (18) gives the number of agents within \(r^{\mathrm {cluster}}\) distance of each agent at any given moment. Eq. (19) is then used to determine the number of agents throughout the simulation with zero connected agents, i.e., disconnected.

$$\begin{aligned} \Theta (x)&= {\left\{ \begin{array}{ll} 1 &{} \text {if } x \ge 0 \\ 0 &{} \text {if } x < 0 \end{array}\right. } \end{aligned}$$
(16)
$$\begin{aligned} n_{i}^{\mathrm {cluster}}(t)&= \sum _{j \ne i}^{N-1} \Theta (r^{\mathrm {cluster}} - r_{ij}(t)) \end{aligned}$$
(17)
$$\begin{aligned} \phi ^{\mathrm {disc}}&= \frac{1}{T} \int _{0}^{T} \sum _{i=1}^{N} \Theta (n_{i}^{\mathrm {cluster}}(t)-1) \end{aligned}$$
(18)

3.2.2 Minimum connected agents

This parameter measures the minimum number of connected agents averaged throughout the simulation and is therefore dependent on time. Since the drones start at random positions, it was observed that keeping this parameter time-dependent instead of steady state (the global minimum throughout the simulation) gave a better idea of the robustness of the communication graph throughout the simulation.

$$\begin{aligned} \phi ^{\mathrm {cluster}}(t)&= \frac{1}{T}\int _{0}^{T} min \{n_{1}^{\mathrm {cluster}}, n_{2}^{\mathrm {cluster}}... \ \ n_{i}^{\mathrm {cluster}} \} (t)\nonumber \\&\forall \ i: 1,2,...,N \end{aligned}$$
(19)

Finally, to optimize these fitness functions given in Eq. (14) simultaneously, the six objectives must be analyzed for correlations among them so that the system can be represented with fewer objectives, preferably two. In the next section, Principal Component Analysis (PCA) is used for dimensionality reduction so that the multi-objective optimizer NSGA-II can be used effectively.

4 Dimensionality reduction using PCA

A data set of the six objectives discussed in Section 3 is collected to reduce the number of objective functions. This data is just the result of 500 random simulations without any heuristic to cover the entire search space. This search space is the same as the one used for optimization in section 6. Note that we use the fitness values after being passed through the transfer functions as the data for PCA. This can be done directly on the order parameters as well. Both processes would give different correlations depending on the nature of the transfer function. We prefer the former method as it gives a more accurate representation of the matrix components and the exact fitnesses functions used for optimization. This data is used to create the covariance matrix and principal components is shown in Section 2, followed by a qualitative discussion on the correlations.

Fig. 1
figure 1

Matrices obtained from Principal component analysis

In Fig. 1, the matrices obtained from the application of PCA on the objective space are given. The matrices show some interesting results. Some insights are discussed as follows:

\(K_{23}\) is negative, implying that a higher velocity doesn’t necessarily imply higher cohesion. This might be false in situations where the UAVs have very high-velocity magnitudes while traveling long distances or have a large turn radius (as in the case of fixed-wing drones). But in a confined environment speeds must be reduced to maintain cohesion at the edges (where the flock gets broken up most). This is also a consequence of a limited acceleration which aligns with actual physical systems.

\(K_{13}\) also confirms the above statement regarding confined environments. To maintain cohesion at walls, the UAVs can either slow down or skip the wall altogether. A combination of slowing down and breaching the geo-fence makes the above movement the most efficient. Note that intuition would suggest that as speed increases, it would be easier to decrease the wall fitness as there is indeed a limited acceleration/deceleration available. Upon running numerous simulations and making covariance matrices, it was found that this is because \(\mathcal {F}^{\mathrm {wall}}\) itself is time-dependent. This means that the fitness is inversely proportional to the number of time frames the drones spend outside the wall. Since \(\mathcal {F}^{\mathrm {corr}}\), \(\mathcal {F}^{\mathrm {disc}}\), and \(\mathcal {F}^{\mathrm {cluster}}\) collectively maintain cohesion and connectivity wherever possible at the expense of \(\phi ^{\mathrm {vel}}\) and \(\mathcal {F}^{\mathrm {wall}}\), whenever the drones slow down, they naturally spend more time frames outside and turn slowly irrespective of the acceleration. This makes \(\mathcal {F}^{\mathrm {wall}}\) and \(\mathcal {F}^{\mathrm {speed}}\) directly correlated with each other on average. The elements \(w_{11}\) and \(w_{12}\) and \(K_{12}\) represent this fact.

Drones naturally collide less with each other when their velocities are aligned and they are well connected. This is because the time it takes for velocity changes to travel throughout the communication network is much lesser. Although, when this network is strongly connected, the agent has to sum up through many velocity differences in its neighborhood. While this is advantageous when the neighbors are moving in similar directions, it can be detrimental when there is a lot of noise and the inter-agent velocity differences point in different directions. As a result, the summed-up alignment velocity for the concerned agent gets dampened by canceling out. This results in inter-agent friction and makes the entire flock sluggish (slow to react). This is clearly shown by the elements \(w_{2}\) and \(w_{3}\), which are strongly uncorrelated. It is also worth pointing out that elements \(w_{3}\) through \(w_{6}\) are strongly correlated, which confirms the association of velocity cohesion and collisions with the communication network. As expected, the cluster parameters for disconnection and minimum number of connected UAVs are strongly correlated (\(K_{65}\)) as they are both direct functions of the communication network.

Using a single objective can result in the loss of important information as the final fitness is just the collective product or weighted sum. Notably, in noisy dynamical systems such as multi-agent robotics, efforts need to be made to retain as much information as possible and use it intelligently to guide the decision-making process. We propose a multi-objective methodology for optimization of the swarm’s fitness to tackle this problem.

The principal component (w) for the maximum variance captures all the above relations and shows them how they relate with each other on average. The sign of the elements indicates correlation which gives rise to the following features/objectives:

$$\begin{aligned} \mathcal {F}_{1}&= -(\mathcal {F}^{\mathrm {wall}} \ . \ \mathcal {F}^{\mathrm {speed}}) \end{aligned}$$
(20)
$$\begin{aligned} \mathcal {F}_{2}&= -(\mathcal {F}^{\mathrm {corr}} \ . \ \mathcal {F}^{\mathrm {coll}} \ . \ \mathcal {F}^{\mathrm {disc}} \ . \ \mathcal {F}^{\mathrm {cluster}}) \end{aligned}$$
(21)

Unlike traditional PCA, we do not use just the non-redundant objectives. Each objective captures tangible physical information about the simulation and therefore, we multiply the two sets individually to retain that information and also make it easier to draw a comparison with the single objective CMA-ES optimizer as given in Sect. 6. Apart from quantitative comparison, multiplying the \(\mathcal {F}\)’s also eased the qualitative analysis of collective behaviors. For example, a sudden drop in \(\mathcal {F}_2\) could be attributed to a specific fitness within the product, and an appropriate analysis could be carried out. One might also use weights for each objective and their respective correlations to determine this final objective function. Note that we consider the product’s negative as we minimize the objective functions.

It is worth mentioning that the above covariance matrix is dependent on the number of agents and the size of the confined arena. While some parameters like the cluster connectivity and cohesion remain the same because they are independent of the above parameters, a different non-redundant set of objectives was obtained upon changing the size of the geo-fence. The simulations dictated that the same number of agents in a larger space took more time to align with the shill agents due to the stronger inter-agent alignment over long distances. The covariance matrix for the same is not shown here for brevity. In Vásárhelyi et al. (2018), there is a certain ambiguity in the size of the geo-fence. While the authors mentioned that they used a side length of 250m for the square arena, the averaged results on their open-source simulator were closer to the claimed ones when a radius of 250m (or side length of 500m) was used for the arena. To make comparisons easier, we also continue with the latter definition for our study.

The reduced objectives are passed to the multi-objective solver NSGA-II (Deb et al. 2002), and the results are summarized below.

5 Numerical experiments

A custom simulator MOflock was created in the Python programming language to test the proposed algorithm and for future work. The ease of use in setting up multiple processes and leveraging optimization and machine learning libraries was a major influence in choosing Python. The simulator is highly object-oriented and modular. It has the drone agent abstracted at various levels and allows experimenting with both single (Bot) and multiple collaborative agents (CoBot). The class diagram for the same is given in Fig. 7 of Appendix C. It was kept in mind that error between RobotSim (Vásárhelyi et al. 2018) and the current work should remain under a threshold of 5-10%. This is done so that MOflock can be validated against state-of-the-art simulators which have been tested on drone hardware. The code repository link is available at supplementary material S1 (2022), and a screenshot of the simulation is shown in Fig. 2. All the experiments are carried out with a flocking velocity (\(v^{\mathrm {flock}}\)) and maximum velocity (\(v^{max}\)) of 6 m/s. However, no changes were made in the algorithm to avoid disrupting the scalability in velocity. Artificial GPS noise is added using the Brownian noise model used in Virágh et al. (2014). Communication delays are integral to the result of optimization as they simulate a kind of inertia at the walls and with neighbors as well. Without these delays and noises, the drones favor high gain and short-range repulsion as opposed to the model optima. Note that the experiments in this paper are carried out in two-dimensional vector space.

Fig. 2
figure 2

MOflock simulation screenshot. The blue and red swarms are non-interacting. See supplementary material S2 (2022) for a video of the simulation

After analyzing the covariance matrix for correlations (Sect. 4), the objectives are combined accordingly and passed to the multi-objective optimizer. A good multi-objective optimization algorithm should contain the following characteristics:

  1. 1.

    Guide the solutions to an optimal Pareto front

  2. 2.

    Maintain solution diversity

NSGA-II is proven to be one of the best-performing algorithms in this regard. The ‘pymoo’ (Blank and Deb 2020) python library is used with default parameters.

It is imperative to set up the optimization problem so that there is enough diversity in the search space to find "good enough" solutions through heuristic methods. For the sake of exploration, a test run is conducted using the CMA-ES algorithm without any bounds on the parameters. The solution for this setup revealed that the flock only moves in a tight circle around the center and does not collide with the walls at all. While such a solution is mathematically optimal, it does not encapsulate the physical limitations and logical constraints on the variable bounds. This happens because the correlation and wall fitnesses become abnormally high. For instance, this solution has a very large \(r_{0}^{\mathrm {frict}}\) value much greater than the communication range of the drones. This allows each drone to have velocity correlation to the maximum extent and increases \(\mathcal {F}^{\mathrm {corr}}\) drastically.

To avoid such solutions which disregard environmental constraints, either explicitly known bounds can be set on the variables which are realistic and relevant to the physics of a UAV, or another objective that maximizes the search area covered in minimum time can be incorporated into the optimization process. For this study, the former approach is used without any loss of generality. An iterative process is used to find the optimization bounds. To begin with, the exact optimization bounds from Vásárhelyi et al. (2018) are used and tweaked progressively according to our use case. We do this by relaxing or constraining the bounds so that the specific behaviors we wanted to analyze through PCA (colliding/surpassing the wall, coming very close to each other etc.) could be incorporated. For instance, \(r_{0}^{\mathrm {shill}}\) and \(c^{\mathrm {shill}}\) are relaxed so that walls could be ignored sometimes. When all behaviors could be observed, we started optimizing in this space. In cases where the optimizer does’t explore enough or there is a major change in flocking physics, we tweak the bounds, conduct PCA again, obtain the covariance matrix and optimize on the newly obtained bounds. Some bounds were large enough and did not need any change (eg: separation parameters). The bounds used for the variables are shown in Table 1 and some miscellaneous simulation parameters including certain tolerance parameters \(r^{tol}\), \(a^{tol}\), and \(v^{tol}\) for the transfer functions in section 3.2 are given in Table 2. Appropriate values for these tolerance parameters promote a better search of solutions and gradient directions.

We use simulated binary crossover and polynomial mutation with default parameters from the library Blank and Deb (2020), and the optimization is run for a total of 40 generations and a population size of 50. All the experiments were performed on a machine with the AMD Ryzen 7 4800H 16 core CPU and 16 GB of RAM. The results are reported in Sect. 6.

Table 1 Optimization parameters
Table 2 Simulation parameters

6 Results and discussions

The results of the optimization procedure are analyzed and discussed in this section.

Fig. 3
figure 3

Statistical evaluations (mean ± standard deviation) of different configurations with respect to the optimal Pareto front. The blue and red curves are comparisons of MOflock and RobotSim at the model optima(\(X_a\)) for \(v^{\mathrm {flock}} = 6 m/s\). \(\mathcal {F}(X_{a}) \vert RobotSim \) is the multi-objective fitness for the model optima \(X_{a}\) taken from Vásárhelyi et al. (2018) and evaluated on RobotSim itself. \(\mathcal {F}(X_{a}) \vert MOflock \) is the fitness for \(X_{a}\) evaluated on MOflock. \(\mathcal {F}^{CMA-ES}(X_{opt}) \vert MOflock \) is the optimized fitness result on our simulator using the CMA-ES algorithm. The highest ranked Pareto front for the last generation using the NSGA-II algorithm is also shown \(\mathcal {F}_{1}^{NSGA-II}\), \(\mathcal {F}_{2}^{NSGA-II} \vert MOflock \) are mean values for the extreme points on this Pareto Front

Fig. 3 shows 100 simulations for different points. As targeted, the error on mean fitnesses between both simulators at the model optima in Vásárhelyi et al. (2018) is 4.28%. Note that \(\mathcal {F}^{CMA-ES}(X_{opt}) \vert MOflock \) was not evaluated using a multi-objective algorithm but was separated into \(\mathcal {F}_{1}\) and \(\mathcal {F}_{2}\) according to Sect. 4. This is done so that comparisons can be drawn easily between the single objective and multi-objective results.

Fig. 4
figure 4

Optimization results

Since the single objective fitness is the product of all individual fitnesses, it follows that neither of the six fitnesses can be close to zero or even guaranteed to be maximum if there exists a negative correlation between some. As a result, when optimizing a single objective function, a ‘best of both’ situation is sought after. However, in the case of multiple conflicting objectives, this can be forgiven for better performance on the separated fitness functions. This also explains why the CMA-ES point lies around the knee of the Pareto fronts. It should be noted, however, that the CMA-ES optimum on our simulator outperforms the Pareto front at its knee. This is owed to the high degree of automation and robustness of the CMA-ES algorithm.

While the user can now choose amongst any of the points depending on the scenario and relative importance, there are two interesting points on the optimal front corresponding to the extreme situations when either one of the two solutions is compromised for the other. They are given by points A and B in Fig. 3. The values of the variables and fitnesses at the above points are summarized in Table 3. Fig. 4a shows every \(4\hbox {th}\) Pareto front from the last generation. This spacing was only chosen to display the spread and convergence in a neat manner. A snapshot of the relevant simulations for both points is also shown in Fig. 4b along with the graphs for their order parameters in Fig. 5. They can be qualitatively understood as follows:

Point A: A weaker cluster-dependent fitness shows that multiple clusters can coexist in the same environment when cohesion and speed is sacrificed.

Point B: Similarly, the other point clearly skips the geo-fence and/or slows down to maintain a good cohesion and compensate for the damping caused by inter-agent friction and pressure at the walls.

The generation of the above two points directly results from the physical and environmental restrictions imposed on the swarm. The limited acceleration does not allow the entire swarm to turn sharply without slowing down. The confined walls don’t allow agents to flock together when moving at high speeds without losing some cohesion. These statements are a testament to the complex dynamics that multi-agent systems exhibit. A video showing the above interactions is available at supplementary material S2 (2022). Better mathematical formalism and high-fidelity simulations can be developed to realize such intertwined relationships.

Fig. 5
figure 5

Cumulative order parameters for points A and B. The order parameters have been scaled equally using simulation parameters to accommodate them on a (0,1) range ordinate

The trend in the order parameters in Fig. 5 also confirms the covariance matrix elements in Section 4. Note that the graph is scaled to the (0,1) interval with the relevant maximum feasible values for each parameter, and cumulative values are shown for the curves.

Table 3 Optimization results

Further statistical analysis of the data from the optimization shows that there is a lot of redundancy in the decision variables. The following observations indicate this finding:

  • Even though points A and B are far apart on the Pareto front, their respective parameters for repulsion are very similar.

  • It was observed that the right combination of \(r_{0}^{\mathrm {shill}}\) and \(a^{\mathrm {shill}}\) gives similar fitnesses and order parameters even with a constant shilling velocity.

  • The introduced shill gain (\(c^{\mathrm {shill}}\)) does not take its maximum possible value (1.0) even when seeking the best \(\mathcal {F}_{1}\), which is highly dependent on this parameter.

Note that a complete PCA correlation analysis on the decision variables can be performed to confirm the above observation and reduce the dimension of the input space as well.

The above results are more consequential than just a Pareto front. Real-life missions and the inherently stochastic nature of the environment demand a range of potential solutions from which a human in the loop can choose in an ad-hoc manner. A typical mission profile consists of cruise, loiter, surveillance, and occasionally a payload drop. A brief description of the use of the practical applications of the Pareto optimal points are shown below.

  • Target search and loitering is a common phase in surveillance missions. A snapshot of an extreme case where the target is located at a corner of the geo-fence is shown in Fig. 6a. The flock breaks at corners and walls to loiter around the target. To make this observation mathematically sound, another order parameter called \(\phi ^{target}\) is created. The target following physical model is taken from Virágh et al. (2014).

    $$\begin{aligned} {\textbf {x}}^{COM}(t)&= \frac{\sum _{i =1}^{N} {\textbf {r}}_{i}(t)}{N} \end{aligned}$$
    (22)
    $$\begin{aligned} \bar{d}^{target}(t)&= \Vert {\textbf {x}}^{target}-{\textbf {x}}^{COM}(t) \Vert \end{aligned}$$
    (23)
  • where,

  • \({\textbf {x}}^{COM}(t) \equiv \) Center of mass of the swarm at time t

  • \(\bar{d}^{target}(t) \equiv \) Mean distance to target (over all N agents) at time t

This parameter includes two performance measures- the closeness of the entire flock to the target on average (Eq. 23) and the ‘Loiter Frequency (\(\omega \))’. This frequency measures how fast the flock can loiter around the target and turn around as a whole. As opposed to the other parameters, the steady state version of this parameter is measured. Since the motion is circular and periodic, the time series is fit to a sinusoidal wave similar to an audio signal.

$$\begin{aligned} \phi ^{target}(t)&= a. \sin (\omega . \bar{d}^{target}(t) + \psi ) + c \end{aligned}$$
(24)
$$\begin{aligned} F&= FFT(\phi ^{target}) \end{aligned}$$
(25)
$$\begin{aligned} \overline{\bar{d}^{target}}&= \frac{1}{T} \sum _{t=0}^{T-1} \bar{d}^{target}(t) \end{aligned}$$
(26)
$$\begin{aligned} a_{o}&= \sqrt{\frac{2}{T}\sum _{t=0}^{T-1}(\bar{d}^{target}(t)- \overline{\bar{d}^{target}})^{2}} \end{aligned}$$
(27)
$$\begin{aligned} f_{o}&= \vert f^{s}_{argmax( \vert A_{k} \vert ) } \vert \end{aligned}$$
(28)
$$\begin{aligned} \psi _{o}&= 0 \end{aligned}$$
(29)
$$\begin{aligned} c_{o}&= \overline{\bar{d}^{target}} \end{aligned}$$
(30)
$$\begin{aligned} a, \omega , \psi , c&= LSF(\phi ^{target}, \bar{d}^{target},a_{o}, f_{o}, \psi _{o}, c_{o}) \end{aligned}$$
(31)
  • where,

  • \(F \ \equiv \) Fourier transform output

  • \(\overline{\bar{d}^{target}} \equiv \) Mean of the average distance throughout the simulation

  • \(f^{s} \ \equiv \) Sample frequencies for the time series data

  • \(\bar{d}^{target} \ \equiv \) Average distance throughout the simulation

  • \(f_{o} \ \equiv \) Initial guess of frequency for \(\phi ^{target}(t)\) corresponding to the maximum F

  • \(\psi _{o} \ \equiv \) Initial guess of phase for \(\phi ^{target}(t)\)

  • \(a_{o} \ \equiv \) Initial guess of amplitude for \(\phi ^{target}(t)\)

  • \(c_{o} \ \equiv \) Initial guess of offset for \(\phi ^{target}(t)\)

This is done by first getting an estimate of the initial coefficients, namely amplitude (\(a_{o}\)), phase (\(\psi _{o}\)), offset (\(c_{o}\)), and frequency (\(f_{o}= \omega _{o}/2\pi )\)) via a Fast Fourier Transform (FFT) on the data (Eq. 2425) and then passing this estimate for Least Squares curve Fit represented by LSF (Eq. 31). The final order parameter is just the angular frequency divided by the amplitude.

$$\begin{aligned} \mathcal {F}^{target}&= \omega /a \end{aligned}$$
(32)

The analysis shows that point A on the Pareto front has a lower loiter frequency because of the extra inter-agent friction created to maintain the flock cohesion. Point B, on the other hand, has almost half the amplitude and double the frequency because of the higher velocity, loosely correlated flock with more collisions. These curves and an accompanying simulation screenshot are shown in Fig. 6. A full video showing the target tracking and fitness analysis is available at supplementary material S3 (2022).

Fig. 6
figure 6

Target order parameter comparison

  • There have been recent studies in which collisions are handled explicitly through physical boundaries and mechanisms instead of an implicit algorithm (Mulgaonkar et al. 2017). The idea is to allow for some collisions as long as agility is maintained and the drones reach their target. Point A on the front is akin to such a situation. The flock does not give much attention to inter-agent separation or cohesion in local clusters. Instead, speed is given a higher priority. This is especially useful when tiny drones need to overcome narrow passages and crevices without acting as a fully connected flock but get through the region as fast as possible, with each drone acting for itself.

  • Point B naturally resembles a good flock where connectivity and cohesion is concerned. The decentralized neighbor architecture makes the flock very desirable where robustness and swarm health is an absolute requirement, and the entire swarm needs to travel long distances as a fully connected cluster.

While developing the methodology for this work, several characteristics of collective behavior were noticed in the multi-agent simulations. For instance, the parameters which characterize the swarm changed drastically based on factors like communication delay and the arena size. These two variables affect the swarm as a whole because any control action for an agent close to the wall is propagated throughout the swarm with the appropriate communication delay. Naturally, every PCA analysis with different simulation parameters yielded unique objectives and, therefore, a different Pareto front. The advantage of separating the objective function into multiple grouped objectives is that global swarm behavior can be controlled by choosing a point on the Pareto front instead of tuning parameters manually or running an offline optimization for each possible situation that the swarm would encounter. Therefore, it follows that this swarm behavior can be controlled by a supervisor with access to the appropriate Pareto front. Moreover, such distinctions and swarm gradients are more consequential when we start attributing human-readable names to these fitnesses. In our case, the swarm which minimizes \(\mathcal {F}_1\) acts as an ‘agile’ swarm whereas minimizing \(\mathcal {F}_{2}\) constitutes a ‘cohesive’ swarm. A human operator with access to the Pareto front and such terminology could potentially direct global swarm behavior according to the situation at hand. Similar notions of ‘stable’ vs ‘sensitive’ swarms have been developed in the past (Balázs et al. 2020) and show that the problem of generalizing a semi-autonomous swarm based on various scenarios is often difficult to handle with just online learning algorithms. A compromise between both, wherein we can control large-scale behavior through multiple objectives and individual decision-making through reinforcement learning can be sought after to solve the generalization problem.

7 Conclusion

In this paper, we proposed a methodology to address the problem of drone flocking. First, a simulator with an integrated optimizer was designed to test the algorithm. The decision variables that characterize the flocking operators and fitness functions that indicate the performance swarm’s performance were derived from the Vásárhelyi et al. model and modified accordingly. To use the multi-objective optimizer effectively, the six-dimensional objective space was reduced to two dimensions using Principal Component Analysis. The correlation analysis revealed that fitness functions for both speed and wall avoidance could be treated separately from the cohesive movement of the entire flock. This process also provided insight into the various complex relationships that multi-agent systems can exhibit. Further, the so formed two objective optimization problem is optimized using NSGA-II and the results are compared with the single objective CMA-ES optimization algorithm. It is found that while CMA-ES performs better with respect to the knee of the Pareto front (in situations where a ‘best of all’ configuration is required), NSGA-II outperforms CMA-ES on the extreme points as it offers an entire range of solutions to choose from. The study also discussed the use cases of such a Pareto front to guide the decision-making process in real-world scenarios. Incorporating algorithms like Reinforcement Learning with the proposed methodology can be future research agenda.