1 Introduction

Unmanned aerial vehicles (UAVs) have been widely used in civilian or military fields in recent years, which include but are not limited to target tracking, reconnaissance, surveillance, remote sensing, and battle damage assessment. Compared to single UAV, multiple UAVs will perform these missions more efficiently and robustly. There have been lots of work on developing the coherent and efficient coordination of multi-UAVs in many cooperative control problems [1,2,3,4,5,6,7], e.g., formation flight, cooperative surveillance, obstacle avoidance, target tracking, communication maintenance, target detection and localization, and many others.

This paper mainly focuses on the cooperative searching problem in complex environment, based on the prior region information of target probability, sensor detection difficulty, and geographic environment. The framework of this problem can be divided into two parts: environment modeling and decision-making process. The goal is to obtain the effective and collision-free coverage routes, by which the UAV group can be instructed to search the target of interest and then gather the maximum payoff (i.e., cumulated probability) or decrease the uncertainty as much as possible during the given searching time.

Searching theory is first presented and developed in the single-UAV search-and-rescue operations [8, 9]. The region is usually discretized into grids with associated target probability, and then the optimal trajectory is generated based on the probability distributions from Bayesian filter updates. Many mathematical and heuristic approaches have been proposed to instruct single UAV to search the stationary or moving target in the region, which include but are not limited to greedy methods (one-step or N-step look-ahead method) [10], probabilistic methods [11], and so on. In Ref. [12], a hierarchical robot-independent system is designed to compute the complete coverage trajectory with minimum repetition through the known obstacle environment. The whole region is decomposed into several free regions, and the optimal routes through these connected regions are then computed, and the complete coverage path is finally obtained accounting for the range of the deployed sensor. Lin et al. [13] present the hierarchical heuristic search method for UAV coverage path planning. By prioritizing the region with probability distribution based on Gaussian mixture model (GMM), this method outperforms other traditional heuristic methods with respect to searching efficiency.

As mentioned above, the searching problem by single UAV has been well resolved. However, the cooperative search in complex environment is a much more complicated problem. The cooperation mechanism, e.g., mission allocation, collision avoidance, simultaneous arrival at given end point, and communication maintenance must be taken into account [14]. Unlike the traditional path planning for obstacle avoidance [29] which is the variant of traveling salesman problem, however, the cooperative searching problem is much more difficult with the challenges of partial detection and repeated visits. In order to gather more information, UAVs should take all uncertain areas as possible waypoints. In recent years, a great deal of progress about cooperative target searching has been made by researchers.

Zamboni search is presented to produce the parallel trajectories, ensuring that UAVs visit all the points at least once [15, 16]. It is one kind of exhaustive coverage method, so is the Random Search. This kind of method may be the best strategy when the target probability is uniformly distributed or completely unknown, but it will be obviously inefficient when there exists prior information about target or the searching time is limited. To our knowledge, search graph-based method is one kind of mainly utilized algorithm for cooperative searching. In Ref. [17], on the basis of the dynamically updated graph, a receding horizon optimization-based cooperative search algorithm is presented to discover the target with probability one in finite time, which jointly optimizes the routes and sensor orientations at the same time. Millet et al. [18] present a decentralized cooperative search method including a two-step updating procedure for the probability maps. Based on the Bayesian rule, each UAV updates its individual probability map of coverage area after one observation, and then transmits the probability map to its neighbors for map fusion. In Ref. [19], multi-UAVs searching target in large scale environment is solved by explicit decentralized gradient-based negotiation, which has the computational advantage and robustness compared to conventional mathematical methods. Ref. [20] utilizes the Distributed Model Predictive Control (DMPC) to solve the cooperative search problem in the target area, which is thoroughly visited by a team of vehicles while avoiding collision and effort duplication. Other methods include but are not limited to Local Hill-Climbing method (i.e., one-step greedy method) [21], intelligent methods [22], mixed-integer linear programming (MILP) [23], and so on. However, these methods may have the local optimum problem, meaning that UAVs may move in the subregion with local maximum target probability for too long time before they break away. Hence the efficiency will be low especially in the large searching space with complex probability distribution. Unlike the above methods, one efficient searching way is first to partition the space into several subregions by methods [5, 15, 24, 26], e.g., polygonal decomposition, Voronoi graph decomposition, fuzzy c-mean clustering. The number of subregions is usually equal to that of UAVs. Then the cooperative search problem is translated to a simple single-UAV search problem, where each UAV only needs to plan its respective coverage trajectory in its allocated subregion, and the collision between UAVs can naturally be avoided. But the allocation principle is rough or unreasonable sometimes, and the local optimum problem may still exist in the subregion.

In this paper, we focus on solving the cooperative searching problem in complex environment. By combing GMM and RHC, a hybrid method with three-layer structure is presented here. First, as GMM is an efficient method of data clustering, it is utilized here to portion the region into several subregions on the basis of probability distribution map. Then, subregions are prioritized and then allocated to UAVs based on the predicted payoff. For each UAV, there will be one or several subregions to be visited. Third, the searching path of each UAV is planned by the RHC-based concurrent method. The individual path segment in each allocated subregion is calculated by RHC concurrently, and the best path segment is chosen aiming to obtain the maximum payoff of one visit. UAV trajectory can finally be obtained by combing these segments, and the local optimum problem is well resolved.

The remaining paper is organized as follows. Section 2 models the cooperative search problem, including the modeling of environment and various constraints. Section 3 describes the three-layer structure of GMM–RHC method in detail. The simulation results can be seen in Sect. 4. Section 5 draws the conclusion.

2 Cooperative search problem formulation

2.1 Environment modeling

In this paper, we suppose that \(N_u\) homogeneous UAVs perform the searching mission over a rectangular area S at a constant height H, and the searching region is uniformly divided into \(M=L_x \times L_y \) grids. The target speed is assumed to be much smaller than that of UAV. For each grid \(m(m=1,2,\ldots ,M)\), there is an associated value \(p(m)\in \left[ {0,1} \right] \), which denotes the probability of the target existing in this grid. The values of p(m) for all the grids construct the probability distribution map, and we define:

$$\begin{aligned} \sum \limits _{m=1}^M {p(m)} =1 \end{aligned}$$
(1)

Each UAV \(i(i=1,2,\ldots ,N_u )\) is equipped with a gimbaled camera, the line of sight (LOS) of which will always aim straight down even when UAV performs yaw or roll maneuvers. We assume that UAV is always over the center of one grid and the field of view (FOV) of camera covers at least this grid exactly at each time step \(t(t=1,2,\ldots ,T)\). Besides, the camera footprint is regarded as a glimpse, meaning that the corresponding grid will be only detected once at one time step. It should be noticed that, considering the limitation of sensor reliability and the task difficulty of complex areas (e.g., vegetation, buildings), the detection may not occur sometimes even when the camera covers the grid occupied by the target. Hence g(m) is presented here to denote the detection probability when the sensor covers the grid m.

Suppose that the FOV range is exactly covering one grid. Before the modeling of g(m), we first model the complex areas of searching space. In this paper, the searching behavior is performed in urban environment, including tall buildings (the height satisfies \(h\ge H)\) and low buildings (\(h<H)\). Cylinder and cuboid are the main types of buildings, and grid m will be considered to be occupied by the building if the grid center is inside of this building. The grids occupied by tall buildings construct the restricted region or no-fly zone \({\varvec{X}}^R\) which UAVs must avoid, and the corresponding detection probability is assumed to be 0 for simplicity. The grids occupied by low buildings construct the area \({\varvec{X}}^D\) with zero detection probability, but UAVs can fly safely over this area. Based on the above assumptions, we model g(m) as follows:

$$\begin{aligned} g(m)=\left\{ {{\begin{array}{l@{\quad }l} 0&{} m\in {\varvec{X}}^R \hbox { or } m\in {\varvec{X}}^D\\ g_s &{} \hbox {otherwise} \\ \end{array} }} \right. \end{aligned}$$
(2)

where \(g_s \in (0,1]\) is the reliability coefficient of sensor in free environment, and it mathematically quantifies the belief of sensor glimpse reducing the grid uncertainty.

Equation 2 is suitable for the FOV range covering exactly one grid. FOV range sometimes covers more than one grid, as shown in Fig. 1a. \(R_s\) is FOV radius, and \(\theta \) is FOV angle, and H is flight height. In this case, the LOS occlusion, caused by city buildings, should be considered. For simplicity, if the straight line between UAV and grid center intersects the building, we think that the occlusion occurs and the detection probability by UAV is 0. However, this judgment method is very time-consuming as UAV positions are varying, so we model the visibility region here. We define \({\varvec{X}}_m^H\) as the visibility region of grid m at height H, which is shown as the gray area in Fig. 1b and is obtained by the Sweep Algorithm [5]. Only when UAV is in the visibility region, i.e., \({\varvec{x}}_i \in {\varvec{X}}_m^H\), can the grid be detected. Hence g(m) is defined as follows:

$$\begin{aligned} g(m)=\left\{ {{\begin{array}{l@{\quad }l} 0&{} m\in {\varvec{X}}^R \hbox { or }m\in {\varvec{X}}^D \hbox { or } {\varvec{x}}_i \notin {\varvec{X}}_m^H \\ g_s &{} \hbox {otherwise} \\ \end{array} }} \right. \end{aligned}$$
(3)
Fig. 1
figure 1

Illustration of sensor FOV and visibility region. a Sensor FOV. b Visibility region

Let \(d_i^t (m)\in \left\{ {0,1} \right\} \) be the flag describing whether UAV i scans over the grid m at time index t, and \(l_i^t (m)=\sum \nolimits _{j=1}^t {d_i^j (m)}\) is the total number of looks that UAV i has already searched over grid m by time t, and \(L^{t}(m)=\sum \nolimits _{i=1}^{N_u } {l_i^t (m)}\) is the total number of looks over grid m by all the UAVs. We assume that each UAV detects the grids independently with probability g(m), and the observations of each UAV are also independent. Therefore, based on the detection history, the probability of detecting the target over grid m by time t is given as follows:

$$\begin{aligned} G^t (m)=1-\left( {1-g(m)} \right) ^{{{L^{t}}(m)}} \end{aligned}$$
(4)

If UAV i takes one more look on the grid m, the estimated increment of probability \(G^t (m)\) will be:

$$\begin{aligned} \Delta G^t (m)=\left( {1-g(m)} \right) ^{{{L^{t}}(m)}}g(m) \end{aligned}$$
(5)

It is apparent from Eq. (5) that the first look of a grid will result in the maximum increase in certainty about this grid, which changes from 0 to g(m). This update rule can easily express the nature of diminishing returns by each camera look. By combing the target existence probability p(m), we can obtain the real detection reward over grid m by time t:

$$\begin{aligned} P^t (m)=p(m)G^t (m) \end{aligned}$$
(6)

If UAV i takes one more look on the grid m, the estimated increment of detection reward will be:

$$\begin{aligned} \Delta P^t (m){=}\left( {1-g(m)} \right) ^{{{L^{t}}(m)}}g(m)p(m) \end{aligned}$$
(7)

The total detection reward or the cumulated detection probability by time t is computed by:

$$\begin{aligned} P_M^t =\sum \limits _{m=1}^M {P^{t}(m)} \end{aligned}$$
(8)

Define \(\phi =\{ \phi _1 ,\ldots ,\phi _{{N_{u}}}\}\) as the planned trajectories of \(N_u\) UAVs with time T, and \(\Phi \) represents all the possible cases of \(\phi \). For any alternative trajectories \(\phi \in \Phi \), according to the statistics of the number of looks over each grid, we can obtain its total detection reward \(P_M^T (\phi )\) from Eq. (8). There is an optimal solution \(\phi ^{{*}}\) from set \(\Phi \), which satisfies \(P_M^T (\phi ^{{*}})\ge P_M^T (\phi ),\forall \phi \in \Phi \). Therefore, our goal is to find the optimal routes \(\phi ^{{*}}\) that obtain the maximum detection payoff, and this cooperative target searching process can be regarded as a discrete optimization problem.

If the time T approaches to infinity, the number of looks over each grid will be large enough, and hence \(P_M^T (\phi )\) will approach 1 eventually. In engineering application, however, the value of T is finite, so \(P_M^T (\phi )<1\) holds. For fair comparison, the detection efficiency is presented here to measure the path quality of different solutions:

$$\begin{aligned} r_{eff} =\frac{P_M^T (\phi )}{P_M^T (\phi ^{{*}})} \end{aligned}$$
(9)

where \(r_{eff} \le 1\) holds. However, as the optimal route \(\phi ^{{*}}\) is unknown actually, the ideal trajectories \(\phi _{\mathrm{ideal}} =\{\phi _{\mathrm{ideal}1},\ldots ,\phi _{\mathrm{ideal}N_u }\}\) are constructed here to replace \(\phi ^{{*}}\). The following is the process of generating \(\phi _{\mathrm{ideal}}\) [13]: First, for each UAV i, we take its initial point as the first waypoint (\(t=1)\) of its path \(\phi _{\mathrm{ideal}i}\); second, at next time step (\(t=t+1)\), each UAV successively transfers to the selected grid, by one more look of which UAV can collect the largest increment of probability according to Eq. (7), and the total number of looks over the selected grid updates accordingly; then, the above procedures continue until \(t>T\) holds, and we can obtain the ideal path of each UAV. The detection efficiency is hence rewritten as:

$$\begin{aligned} r_{eff} =\frac{P_M^T (\phi )}{P_M^T (\phi _{\mathrm{ideal}} )} \end{aligned}$$
(10)

In addition, during the process of generating \(\phi _{\mathrm{ideal}}\), the selected grid should be outside of the no-fly zone to achieve the obstacle avoidance; to avoid collision between UAVs, the grid must not be occupied by two or more UAVs at the same time. However, it should be noticed that most of the waypoints are still non-adjacent or high-steering-angle grids, meaning that UAV will jump from one grid to another ideally ignoring UAV dynamic constraints.

2.2 Simplified UAV model

As mentioned above, UAVs fly at a constant altitude, so UAV position can be simplified as the projection onto the searching ground S. Besides, each agent is assumed to be always over the center of one grid at each time step. Hence the position of agent i at time t can be denoted as \(\left[ {x_i^t ,y_i^t } \right] \in \left\{ {1,\ldots ,L_x } \right\} \times \left\{ {1,\ldots ,L_y } \right\} \). Each UAV will move from its current grid to the neighboring grids at each time step subjecting to the dynamic constraints. In this paper, the physical curvature radius constraint is mainly considered for simplification: We define \(\varphi _i^t \in \left\{ {0^{\circ },45^{\circ },\ldots ,270^{\circ },315^{\circ }} \right\} \) as the heading angle of agent i at time t, and UAV will have three possible orientations, i.e., \(\varphi _i^{t+1} \in \left\{ {\varphi _i^t -45,\varphi _i^t ,\varphi _i^t +45} \right\} \) at the next time step \(t+1\), and the heading rate is denoted as \(\omega _i^t \in \left\{ {-45^{\circ },0^{\circ },45^{\circ }} \right\} \). To ensure that the agent is always located at the grid center at each time step, the agent velocity \(v_i^t\) with the orientation \(\varphi _i^{t+1} \in \{45^{\circ },135^{\circ },225^{\circ },315^{\circ }\}\) should be \(\sqrt{2}\) times as that of other cases. We take \({\varvec{z}}_i^t =\left[ {x_i^t ,y_i^t ,\varphi _i^t } \right] ^{\mathrm{T}}\) as the state of agent i at time t, and \(u_i^t =\omega _i^t\) the control input. The corresponding continuous forms of state and control input are \({\varvec{z}}_i =\left[ {x_i,y_i ,\varphi _i } \right] ^{\mathrm{T}}\) and \(u_i =\omega _i\), respectively. The continuous form of UAV kinematics \(\dot{{\varvec{z}}}_i =f({\varvec{z}}_i ,u_i )\) can hence be simplified as:

$$\begin{aligned} \left\{ {\begin{array}{l} \dot{x}_i =v_i \cos \varphi _i \\ \dot{y}_i =v_i \sin \varphi _i \\ \dot{\varphi }_i =\omega _i \\ \end{array}} \right. \end{aligned}$$
(11)

The discrete form of Eq. (11) is:

$$\begin{aligned} {\varvec{z}}_i^{t+1} ={\varvec{z}}_i^t +f({\varvec{z}}_i^t ,u_i^t ,\Delta t) \end{aligned}$$
(12)

where

$$\begin{aligned} f({\varvec{z}}_i^t ,u_i^t ,\Delta t)=\left[ {\begin{array}{l} v_i^t \cos (\varphi _i^t +\omega _i^t \Delta t)\Delta t \\ v_i^t \sin (\varphi _i^t +\omega _i^t \Delta t)\Delta t \\ \omega _i^t \Delta t \\ \end{array}} \right] \end{aligned}$$
(13)

where \(\Delta t\) is the sampling time interval (\(\Delta t=1\) s in this paper). By this formulation, the grids UAV search in sequence can constitute the waypoints of path, and all the consecutive waypoints are then connected to construct the planned path.

2.3 Constraint conditions

We assume that the connectivity or communication between any two UAVs will always be maintained in this paper. This is to say, each UAV will perfectly obtain the states of all the UAVs and store their history of detection as well. Besides, the constraint of collision avoidance is supposed to be satisfied if any two UAVs \((\forall i,j\in \{1,\ldots ,N_u \}\hbox { and }i\ne j)\) do not occupy the same grid at the same time:

$$\begin{aligned} \left[ {x_i^t ,y_i^t } \right] \ne \left[ {x_j^t ,y_j^t } \right] \end{aligned}$$
(14)

In complex environment, there are various obstacles which UAVs must avoid during the process of searching target. In the urban environment studied in this paper, UAV should be always outside of the restricted region:

$$\begin{aligned} \left[ {x_i^t ,y_i^t } \right] \notin {\varvec{X}}^R \end{aligned}$$
(15)

In addition to UAV motion constraints described in Sect. 2.2, there are also some initial constraints, terminal constraints, and time constraints for UAV regroup or assemble. In this paper, each UAV is forced to reach its corresponding pre-specified destination \(\left[ {x_i^f,y_i^f } \right] \) from the initial point \(\left[ {x_i^0 ,y_i^0 } \right] \) respectively. What’s more, all the UAVs should arrive at the destinations simultaneously at the specific time T. Hence these constraints of each UAV \(i(i=1,2,\ldots ,N_u)\) can be modeled as:

$$\begin{aligned} \left[ {x_i^1 ,y_i^1 } \right]= & {} \left[ {x_i^0 ,y_i^0 } \right] \end{aligned}$$
(16)
$$\begin{aligned} \left[ {x_i^T ,y_i^T } \right]= & {} \left[ {x_i^f ,y_i^f } \right] \end{aligned}$$
(17)
Fig. 2
figure 2

Three-layer structure of cooperative search by GMM–RHC

3 Cooperative search strategy based on GMM–RHC

3.1 Multi-UAV search framework using GMM–RHC

As we know, one traditional effective way of multi-UAV cooperative searching is to partition the searching space into several subregions, the number of which is equal to that of UAVs. Then the cooperative search problem is translated to a simple single-UAV search problem, where each UAV only needs to compute its respective trajectory in its allocated subregion, and the collision between UAVs can hence be avoided naturally. However, the allocation principle is rough and unreasonable sometimes. Consequently, if there are several peaks of probability in its subregion, UAV may get stuck in the area with local maximum probability for too long before it moves to another probability peak.

In this paper, a three-layer architecture using GMM–RHC is presented to solve the cooperative searching problem, as shown in Fig. 2. First, GMM is utilized to approximate the probability distribution map, i.e., p(m), and the searching region S is decomposed and hence several subregions can be extracted, each of which represents a cluster of probability p(m). It is important to note that the number of subregions may be larger than that of UAVs. Second, these subregions are prioritized hierarchically by evaluating their Gaussian models obtained from GMM, and then allocated to UAVs aiming to maximize mission effectiveness. Third, each UAV visits its allocated subregions sequentially and plans route by concurrent RHC method based on the history of detection information. In essence, this three-layer architecture can be regarded as a distributed control structure with certain centralization mechanism, which simultaneously combines the global optimization performance of centralized method and the robustness of decentralized method.

In the proposed method, each subregion by GMM will only correspond to one probability peak, and all the subregions are prioritized and allocated to UAVs hierarchically, so each UAV will prefer to visit the high probability areas at early stage. The concurrent RHC method ensures UAVs obtain the maximum payoff at each time step as well. Hence the local optimum problem is well resolved, and the efficiency of target searching will be greatly enhanced.

3.2 Searching region decomposition by GMM

The probability distribution map of target, in most cases, is not completely irregular or unknown. It can be generated by Bayesian model considering the target’s last known position, and there will be several high-quality clusters in this region. However, the distribution map of sensor detection probability in complex environment (especially the urban environment with scattered buildings studied in this paper) is usually irregular. Hence GMM is only utilized to approximate the probability distribution map of target.

By quantifying the population with several Gaussian probability density functions, GMM is a hierarchical model for finding the clusters within the population [27], which is often used for image processing. In this paper, GMM is utilized to approximate the probability distribution map of target.

First, we assume that there are D data points (\(D\ge M)\) to form the population. These data points are confirmed as follows. The number of data points located at the grid center \(\left[ {x_m ,y_m } \right] \in \left\{ {1,\ldots ,L_x } \right\} \times \left\{ {1,\ldots ,L_y } \right\} \,m(m=1,2,\ldots ,M)\) is proportional to its target probability p(m):

$$\begin{aligned} D_m =p(m)D \end{aligned}$$
(18)

The equation \(\sum \nolimits _{m=1}^M {D_m } =D\) holds based on Eq. (1), and the set of data points correspond to the probability distribution map of target.

Second, we assume that there are at most K Gaussian components \(G_k ({\varvec{x}})(k=1,\ldots ,K)\) with the coefficient \(\alpha _k\) in the mixture model. The probability density function of each Gaussian model \(G_k ({\varvec{x}})\) is:

$$\begin{aligned} G_k ({\varvec{x}})=\frac{1}{\sqrt{(2\pi )^{2}\left| {{\varvec{C}}_k } \right| }}\exp \left( -\frac{1}{2}({\varvec{x}}-{\varvec{\mu }}_k )^{\mathrm{T}}{\varvec{C}}_k^{-1} ({\varvec{x}}-{\varvec{\mu }}_k )\right) \end{aligned}$$
(19)

where \({\varvec{x}}{=}[x,y]^{\mathrm{T}}\) is the two-dimensional position in this paper representing the grid center, \({\varvec{C}}_k\) is the covariance matrix and \({\varvec{\mu }}_k\) is the mean vector. With the property of \(\sum \nolimits _{k=1}^K {\alpha _k } =1\), the mixture model can be written as:

$$\begin{aligned} p({\varvec{x}})=\sum \limits _{k=1}^K {\alpha _k G_k ({\varvec{x}})} \end{aligned}$$
(20)

Therefore, we should estimate the parameters of each component, i.e., \(\alpha _k\), \({\varvec{C}}_k\) and \({\varvec{\mu }}_k\). The widely used EM (Expectation Maximization) method is utilized here. The process of EM is as follows:

  1. 1)

    Initialization. Based on the judgment of intra-cluster distance, k-means method is first used to obtain the clusters with high similarity. The proportion of each cluster is taken as \(\alpha _k\), and the mean of each cluster is regarded as \({\varvec{\mu }}_k\), and \({\varvec{C}}_k\) can then be computed.

  2. 2)

    Expectation. For each data point (\(d=1,\ldots ,D)\) with position \({\varvec{x}}_d\), we compute the posteriori probability of \(\alpha _k\) (\(k=1,\ldots ,K)\), which is denoted as \(\beta _{dk}\):

    $$\begin{aligned} \beta _{dk} =\frac{\alpha _k G_k ({\varvec{x}}_d )}{\sum \nolimits _{i=1}^K {\alpha _i G_i ({\varvec{x}}_d )} } \end{aligned}$$
    (21)
  3. 3)

    Maximization. The parameters \(\alpha _k\), \({\varvec{\mu }}_k\) and \({\varvec{C}}_k\) are updated as follows:

    $$\begin{aligned} \alpha _k= & {} \frac{\sum \nolimits _{d=1}^D {\beta _{dk} } }{D} \end{aligned}$$
    (22)
    $$\begin{aligned} {\varvec{\mu }}_k= & {} \frac{\sum \nolimits _{d=1}^D {\beta _{dk} {\varvec{x}}_d } }{\sum \nolimits _{d=1}^D {\beta _{dk} } } \end{aligned}$$
    (23)
    $$\begin{aligned} {\varvec{C}}_k= & {} \frac{\sum \nolimits _{d=1}^D {\beta _{dk} ({\varvec{x}}_d -{\varvec{\mu }}_k )({\varvec{x}}_d -{\varvec{\mu }}_k )^{\mathrm{T}}} }{\sum \nolimits _{d=1}^D {\beta _{dk} } } \end{aligned}$$
    (24)
  4. 4)

    Convergence. Repeat Step 2 and 3 until the condition \(\left| {p({\varvec{x}})-{p}'({\varvec{x}})} \right| <\varepsilon \) holds, where \(p({\varvec{x}})\) and \({p}'({\varvec{x}})\) are the values calculated by Eq. (20) using the estimated mixture model at the current and last iteration, respectively. Besides it should be noticed that, if the coefficient of any component is small enough (\(\alpha _k <0.01\) in this paper) at the current iteration, this component is eliminated.

The subregions can be extracted from the searching area based on the parameters of Gaussian components. For each Gaussian component, there is a corresponding area where the sum of probability under the standard Gaussian function within three standard deviations is 99.7%. The weigh of this subregion is \(\alpha _k\). This subregion is actually an ellipse area with the center \({\varvec{\mu }}_k\), and the two axial lengths are \(3\sigma _{{x_{k}}}\) and \(3\sigma _{y_k }\), respectively, in the translated coordinate system, where the two eigenvectors of \({\varvec{C}}_k\) are taken as coordinate axes. \(\sigma _{{x_{k}}}\) and \(\sigma _{{y_{k}}}\) are the square roots of the eigenvalues of \({\varvec{C}}_k\) corresponding to the aforementioned eigenvectors. In a scenario, the target probability distribution map and its mixture result by GMM with \(K=8\) are shown in Fig. 3. As we see, GMM can well approximate the real probability distribution map.

Fig. 3
figure 3

Approximation of probability distribution map by GMM. a Probability distribution. b Gaussian mixture result

3.3 Subregion prioritization and allocation

We first discuss the case of single-UAV searching problem. The subregions \(\{S_1 ,\ldots ,S_K \}\) by GMM should first be prioritized, and then UAV visits these subregions in accordance with the sequence. In this way, UAV can collect more detection payoff. In this paper, three factors including subregion reward, subregion area, and region-transferring distance constitute the priority indexes of subregion.

As discussed in Sect. 3.2, the sum of probability under a standard Gaussian function in each subregion is 99.7%. Based on the coefficient of each Gaussian component in the mixture model, we can obtain the subregion reward:

$$\begin{aligned} R_k =0.997\alpha _k \end{aligned}$$
(25)

Each subregion is actually an elliptical area with axial lengths \(3\sigma _{{x_{k}}}\) and \(3\sigma _{{y_{k}}}\). From Eq. (6), it is obvious that not only the target probability but also the detection probability g(m) will influence the searching behavior. From Eq. (2), in the urban environment g(m) will be 0 if UAV is in the region \({\varvec{X}}^R\) or \({\varvec{X}}^D\). Hence these areas should be subtracted as follows when we compute the subregion area:

$$\begin{aligned} A_k =\pi \cdot 3\sigma _{{x_{k}}} \cdot 3\sigma _{{y_{k}}} -\hbox {Area}(S_{{\varvec{X}}^R \cup {\varvec{X}}^D } \hbox {|}S_{{\varvec{X}}^R \cup {\varvec{X}}^D } \subset S_k ) \end{aligned}$$
(26)

where \(\hbox {Area}(S_{{\varvec{X}}^R \cup {\varvec{X}}^D } {|}S_{{\varvec{X}}^R \cup {\varvec{X}}^D } \subset S_k )\) means the area of \({\varvec{X}}^R\) and \({\varvec{X}}^D\) involved in the subregion \(S_k\). In this paper, \(A_k\) actually means the time spent for coverage flight in the subregion.

The region-transferring distance means the distance from one subregion to another, and the distance between the transferred subregion and destination is also included when end point exists. At the first iteration of region classification (\(c=1)\), region-transferring distance is defined as:

$$\begin{aligned} L_k =\left\{ {{\begin{array}{l@{\quad }l} \left\| {{\varvec{x}}^{0}-{\varvec{\mu }}_k } \right\| +\left\| {{\varvec{x}}^{f}-{\varvec{\mu }}_k } \right\| &{} \hbox {With end point} \\ \left\| {{\varvec{x}}^{0}-{\varvec{\mu }}_k } \right\| &{} \hbox {otherwise} \\ \end{array} }} \right. \end{aligned}$$
(27)

where \({\varvec{x}}^{0}\) is the start point and \({\varvec{x}}^{f}\) is the end point. At other iterations (\(c\ge 2)\), region-transferring distance is defined as:

$$\begin{aligned} L_k =\left\{ {{\begin{array}{l@{\quad }l} \left\| {{\varvec{\mu }}_{{I_{c-1}}} -{\varvec{\mu }}_k } \right\| +\left\| {{\varvec{x}}^{f}-{\varvec{\mu }}_k } \right\| &{} \hbox {With end point} \\ \left\| {{\varvec{\mu }}_{{I_{c-1}}} -{\varvec{\mu }}_k } \right\| &{} \hbox {otherwise} \\ \end{array} }} \right. \end{aligned}$$
(28)

where \({\varvec{\mu }}_{{I_{c-1}}}\) is the selected subregion center with highest priority from the remaining subregions at \(c-1\) iteration. \(L_k\) means the time spent for the transition between subregions.

The priority index of subregion \(S_k\) can then be confirmed as:

$$\begin{aligned} \hbox {EP}_k =\frac{R_k }{A_k +L_k } \end{aligned}$$
(29)

The larger \(\hbox {EP}_k\) is, the higher the priority order of subregion is. At cth iteration, by sorting the priority indexes of the remaining subregions, we can obtain the one with the maximum value and take it as the c-level subregion. Then it is removed from the remaining subregion set for the next iteration (\(c=c+1)\), and the above steps repeat until all the subregions are prioritized. This is to say, the process will have K iterations before stop.

Table 1 Pseudocode of RHC-based concurrent strategy

For single-UAV searching problem, the prioritized areas will be visited by UAV orderly. However, for multi-UAV searching problem, the subgerions should first be allocated to UAVs under the optimal index. Suppose that there are \(K_i\) selected subregions for each UAV \(i(i=1,2,\ldots ,N_u )\)from the whole set \(\{S_1 ,\ldots ,S_K \}\), and \(\sum \nolimits _{i=1}^{N_u } {K_i } =K\) hence holds. We first prioritize these subregions for UAV i by Eqs. (25)–(29), and the results are denoted as \(\{\bar{{S}}_1 ,\ldots ,\bar{{S}}_{{K_{i}}} \}\). Then we compute the evaluation index of subregions \(\{\bar{{S}}_1,\ldots ,\bar{{S}}_{{K_{i}}}\}\):

$$\begin{aligned} \hbox {EA}_i =\sum \limits _{j=1}^{K_i } {\frac{\left( {A_j +L_j } \right) }{R_j }} \end{aligned}$$
(30)

where \(A_j\), \(R_j\), and \(L_j\) are the three factors computed by Eqs. (25)–(28). But it is noted that the distance between region centroid and end point will not be taken into account when calculating \(L_j\) until the last subregion (\(j=K_i)\).

Then the total index of task allocation is defined as:

$$\begin{aligned} \hbox {EA}=\lambda _1 \sum \limits _{i=1}^{N_u } {\hbox {EA}_i } +\lambda _2 \cdot {\mathop {\mathop {\sum }\limits _{i,j=1}}\limits _{i\ne j}^{N_u }} {\left| {\hbox {EA}_i -\hbox {EA}_j } \right| } \end{aligned}$$
(31)

where the first part reflects the total cost of UAVs, and the second part is used to balance the efficiency of UAV group, and \(\lambda _1\) and \(\lambda _2\) are the weighting coefficients. UAV group will perform the searching mission more efficiently with a smaller \(\hbox {EA}\), so the optimal allocation strategy is determined as:

$$\begin{aligned} \hbox {S}_{\mathrm{allocation}}^*=\hbox {argmin} (\hbox {EA}) \end{aligned}$$
(32)

Based on Eq. (32), the auction algorithm [28] is used here to compute the optimal task allocation scheme.

3.4 Searching behavior decision by RHC-based concurrent strategy

3.4.1 RHC-based concurrent strategy

After prioritizing subregions and assigning them to UAVs, each UAV should orderly visit its allocated subregions with centroids \(\{\bar{{S}}_1 ,\ldots ,\bar{{S}}_{{K_{i}}} \}\). Take UAV i as an example, the RHC-based concurrent strategy is utilized here to optimize the searching behavior.

First, UAV path segments are initialized, including the shortest path \(\phi _0\) from the start point to the first subregion centroid with the flight time \(t_0\), the initial coverage path of each subregion \(\phi _j =\{\bar{{S}}_j \}\) (\(j=1,\ldots K_i)\) with the time \(t_j =1\), the shortest paths between adjacent subregions \(\phi _{j\rightarrow j+1} (j<K_i)\) with flight time \(t_{j\rightarrow j+1}\), and the shortest path \(\phi _f\) from the last subregion to the end point with needed time \(t_f\).

Fig. 4
figure 4

Illustration of the concurrent RHC method with 3 subregions (\(K_i =3)\). a Initialization of coverage path. b Coverage path with more flight time

Second, if the sum of all the above time is less than T, a new waypoint should then be added to the selected path segment as follows. For the local path segments \(\phi _j\) of all the subregions, their next-step behaviors can be planned by RHC method concurrently. But we only choose the subregion \(j^{{*}}\) with the maximum predicted payoff, and take its planned result as the newly added waypoint of local path \(\phi _{{j^{*}}}\), and the time updates by \(t_{{j^{*}}}=t_{{j^{*}}}+1\). Then, the shortest region-transferring-path \(\phi _{{j^{*}}\rightarrow j^{{*}}+1}\) from this waypoint to the next subregion center (if \(j^{{*}}<K_i)\), or the shortest path segment \(\phi _f\) between this waypoint and end point (if \(j^{{*}}=K_i)\), is re-planned. The corresponding time, \(t_{j^{{*}}\rightarrow j^{{*}}+1}\) or \(t_f\), updates as well. The above procedures will continue until the total time is equal to T.

Third, we obtain the optimal UAV path by combing path segments sequentially \(\{\phi _{0},\phi _{1},\phi _{1\rightarrow 2} ,\phi _{2} ,\ldots ,\phi _{\left( {K_{i}} -1 \right) \rightarrow {K_{i}}} ,\phi _{{K_{i}}} ,\phi _{f}\}\), and the decision-making process ends.

By the above concurrent RHC strategy, the searching behavior with the maximum payoff in subregions will be added into path at each time. This strategy ensures that time constraint, initial constraint, and terminal constraint are easily be satisfied. In this paper, with high calculation efficiency, our previously proposed interfered fluid dynamic system (IFDS) method will be utilized here to plan the shortest region-transferring-paths, where no-fly zones or obstacles can easily be avoided [29]. When calculating the coverage paths in the subregion by RHC, safety cost will be introduced into the evaluation index, which can be seen in Sect. 3.4.2. Hence this method can meet the constraint of obstacle avoidance. Besides, UAVs will rarely collide with each other, as they search the target in different subregions. Hence the constraint of collision avoidance is satisfied as well.

The pseudocode of RHC-based concurrent strategy can be seen in the Table 1. The illustration of RHC-based concurrent strategy is shown in Fig. 4, where the planned path will cover more areas since the flight time increases.

3.4.2 RHC method

The standard RHC method [30, 31] is utilized here to plan the coverage path of each subregion. The state of UAV i at time t is \({\varvec{z}}_i^t =\left[ {x_i^t ,y_i^t ,\varphi _i^t } \right] ^{\mathrm{T}}\), and the control input is \(u_i^t =\omega _i^t\), as defined in Sect. 2.2. The future control sequence with time length N is defined as \({\varvec{u}}_i^{t:t+N-1}=[u_i^t ,\ldots ,u_i^{t+N-1} ]\), and the future states \({\varvec{z}}_i^{t:t+N-1}=[{\varvec{z}}_i^t ,\ldots ,{\varvec{z}}_i^{t\hbox {+N-1}} ]\) can be predicted based on UAV model from Eqs. (1213). The objective function can be denoted as \(J\left( {{\varvec{z}}_i^t ,{\varvec{u}}^{t:t+N-1}} \right) \):

$$\begin{aligned}&J\left( {{\varvec{z}}_i^t ,{\varvec{u}}^{t:t+N-1}} \right) \nonumber \\&\quad =\left\{ {{\begin{array}{l@{\quad }l} \sum \nolimits _{j=0}^{N-1} {\Delta P^{t+j} ({\varvec{x}}_i^{t+j} )} &{} \hbox {if }{\varvec{x}}_i^{t+j} \notin {\varvec{X}}^{R} \\ -\infty &{} \hbox {otherwise}\\ \end{array} }} \right. \end{aligned}$$
(33)
Fig. 5
figure 5

Planning results by GMM–RHC. a Probability distribution of Scenario I. b GMM result. c Detection difficulty. d Collected probability on first visit. e Paths in 3D space. f Paths in 2D space. g Detection payoff versus time

where \({\varvec{x}}_i^{t+j} ={\varvec{Hz}}_i^{t+j}\) is UAV position at time \(t+j\), and \({\varvec{H}}\) is the matrix \({\varvec{H}}=[1\, 0\, 0;\, 0\, 1\, 0]\). \(\Delta P^{t+j} ({\varvec{x}}_i^{t+j} )\) is the detection payoff of one more look over grid, as defined in Eq. (7). The penalty index is introduced here to avoid no-fly zones.

Then we can obtain the optimal control sequence \({\varvec{u}}_i^{t^{*}:(t+N-1)^{*}}\) by maximizing the objective function:

$$\begin{aligned} {\varvec{u}}_i^{t^{*}:(t+N-1)^{*}}=\hbox {arg max}J\left( {{\varvec{z}}_i^t ,{\varvec{u}}^{t:t+N-1}} \right) \end{aligned}$$
(34)

Only the first term, i.e., \(u_i^{{t^{*}}}\), is utilized here to obtain the predicted next-step point, and other terms are abandoned.

3.5 Limitations and discussions

In the problem of multi-UAVs searching target, we assume that the target probability distribution and the topographic information are already known and stationary. However, the real case is dynamic or unknown sometimes, so some preliminary improvements are introduced into the GMM–RHC method.

We suppose that it will take some time for UAVs to fly to the target region after obtaining the a priori information of target probability distribution. The potential motion region of target can then be expressed as a sector area, the size and shape of which are defined by the changing range of velocity and heading angle. If the target motions of different directions are independent and the acceleration during time period remains constant, we can predict the initial target probability distribution map on the basis of Gauss distribution model.

Considering the case that the target moves with the average speed (much smaller than UAV speed) during the whole searching process, we introduce the dilatation parameter \(\lambda _x\, \left( {\lambda _x>1} \right) \) and \(\lambda _y \,\left( {\lambda _y >1} \right) \) to expand the size of extracted subregion. Hence the subregion will become an ellipse area with two axial lengths \(3\lambda _x \sigma _{{x_{k}}}\) and \(3\lambda _y \sigma _{{y_{k}}}\).

Fig. 6
figure 6

Searching paths of UAVs with different \(g_s\). a \(g_s =0.5\). b \(g_s =0.2\)

Sometimes the topographic information (i.e., the buildings or threats distribution in the urban environment) is unknown or dynamic. In this case, the first and second procedures of GMM–RHC (i.e., the approximation and decomposition of searching region) will remain unchanged, but only the third procedure (i.e., the path planning of single UAV) is adjusted, to simplify the algorithm. Once UAVs detect the new topographic information at time t, each UAV should plan its rest searching path again by utilizing the RHC-based concurrent strategy (where the duration time becomes \(T-t\), and UAV current position is taken as the initial point). Besides, the dynamic IFDS method [32] is used to plan the shortest path between adjacent subregions.

Fig. 7
figure 7

Total number of looks over grid with different \(g_s\). a \(g_s =0.95\). b \(g_s =0.5\). c \(g_s =0.2\)

Fig. 8
figure 8

Detection payoff versus time with different \(g_s\)

Fig. 9
figure 9

Detection payoff versus time with different \(R_s\)

4 Numerical simulations

In this paper, constrained by the hardware technology and the low security of UAVs in real urban environment, the cooperative searching problem by GMM–RHC is verified by simulations. To test the performance of our proposed target search method, several simulation experiments are done in MATLAB R2011a on a PC with Intel Core i5 CPU processor and 2.5 GHz frequency. Each experiment will be run 20 times independently, and the statistical results including detection efficiency and running time are mainly utilized here for analysis. The urban area with the size 2500 m \(\times \) 2500 m is taken as the search region, which is discretized uniformly into 50 \(\times \) 50 grids or cells. The target probability distribution map is randomly generated by the sum of 10 exponential functions. Unless otherwise specified, the values of other needed parameters are defined as follows: the total time \(T=500\) s, the number of UAVs \(N_u =3\), UAV flight altitude \(H=100\hbox {m}\), the detection probability of sensor \(g_s =0.95\), the FOV radius \(R_s =50\) m, the maximum number of Gaussian components \(K=8\), the number of training data points \(D=10{,}000\), the length of time horizon \(N=4\), and the weighs \(\lambda _1 =0.7\) and \(\lambda _2 =0.3\).

4.1 Target searching by GMM–RHC

Three UAVs with start points (200, 200) m, (800, 100) m, (2000, 500) m will search the target in Scenario I. And the end points are separately (1000, 2200) m, (1200, 2300) m, (1700, 2200) m. The probability distribution map is shown in Fig. 5a, and its GMM result with 7 approximated subregions is shown in Fig. 5b. Some high-rise buildings (dark green color) or low buildings (yellow color) distribute randomly in Scenario I, as shown in Fig. 5e. Hence we can compute the detection probability g(m), and the detection difficulty, i.e., \(1-g(m)\), is displayed in Fig. 5c. By combing the target probability distribution map and the detection difficulty map, we can obtain the detection payoff, i.e., collected probability on the first visit of each grid, as shown in Fig. 5d. By the proposed allocation strategy, the sorted subregions {7,4,6} will be allocated to UAV1, and only subregion {1} is allocated to UAV2, and sorted subregions {2,3,5} are allocated to UAV3. The searching paths of three UAVs are shown in Fig. 5e, f. As we can see, the paths will cover the regions with high value of target probability, and all the high-rise buildings can be avoided. Besides, the areas with higher probability will be detected earlier. Figure 5g illustrates the cumulated detection probability of each UAV and the total payoff of UAV group over time, computed by Eq. (8). With the real payoff \(P_M^T (\phi )=0.6093\) and the ideal payoff \(P_M^T (\phi _{\mathrm{ideal}} )=0.7364\), we infer that the efficiency is 82.74%. In addition, through calculation, the ranges of distances between any UAVs are [493, 2760]m, [550,2371]m, [100,2187]m, respectively, meaning that UAVs will never collide with each other.

Fig. 10
figure 10

Planning results with different methods. a Probability distribution of Scenario II. b Searching paths by RHC (\(T=500\) s). c Searching paths by Voronoi-RHC (\(T=500\) s). d Searching paths by GMM–RHC (\(T=500\) s)

4.2 Comparison between different parameters

In Scenario I, we set other values of sensor detection probability (\(g_s =0.5,0.2)\) for comparison. The paths can be seen in Fig. 6, and the number of looks over grid can be seen in Fig. 7. As we can see, if the sensor detection probability or reliability increases, the paths will be distributed with a larger coverage of the region, and there will be less number of repeated looks over grid. Besides, with a higher \(g_s\), the searching performance will be better, as shown in Fig. 8.

In the above simulation, each UAV sensor covers one grid exactly with the detection range \(R_s =50\) m. For comparison, we set different sensor ranges here (\(R_s =100, 150, 200\) m) in Scenario I and the curves of detection payoff can be seen in Fig. 9. The larger the sensor range is, the more payoff UAVs obtain.

Fig. 11
figure 11

Detection payoff versus time by different methods (\(T=500\) s)

Table 2 Performance of different methods

4.3 Comparison between different methods

To further testify the performances of proposed GMM–RHC method, other five methods including Random Search, Zamboni Search, Hill Climbing, RHC, and Voronoi-RHC are also utilized here for comparison. As the paths by standard Zamboni Search cannot avoid obstacles, IFDS method is introduced here to modify the Zamboni paths. The Voronoi-RHC method is performed as follows: The region is first portioned into \(N_u\) subregions by Voronoi graph, and RHC is then utilized for single-UAV path planning. As these five methods cannot handle the terminal constraints, end point of UAV is not required here. Suppose that three UAVs with start points \((100,100)\hbox {m}\), \((300, 100)\hbox {m}\), (500, 100) m will search the Scenario II with the duration time \(T=500\) s, and the target probability distribution map is as shown in Fig. 10a. As we can see, there is a high-value subregion in the lower right corner, but it is very far away from other subregions.

For the limitation of paper length, only the trajectories of UAVs by RHC, Voronoi-RHC and GMM–RHC are illustrated in Fig. 10b–d, respectively. Without region prioritization and allocation, the searching paths by RHC will get stuck in other subregions but cannot cover the lower right one. By GMM–RHC method, each UAV will perform the searching tasks explicitly in their own allocated subregions, so the local optimum problem can well be solved. As we can see, UAV routes will cover all the subregions, and especially the lower right subregion with high value is searched by UAV3. Although the paths by Voronoi-RHC can cover all the subregions as well, UAV cannot be immediately directed to the centroid of subregion with peak probability, and it will waste some time at the early stage of searching.

The comparison between these algorithms with respect of the detection payoff is shown in Fig. 11. The thick black line is the payoff of the ideal paths \(\phi _{\mathrm{ideal}}\), representing the theoretical upper limit. The payoff curve by GMM–RHC method is the nearest to that ideal curve, so it performs best with the highest detection payoff. Voronoi-RHC is the second-best method, which is superior to the RHC and Hill-Climbing method. The Random Search and the Zamboni Search method gain the least probability here.

Table 2 shows the performance of the six methods with different duration time (\(T=100\), 300, 500,700 s). For different duration time, the priority of these algorithms with respect to efficiency is almost the same as the conclusion from Fig. 11. The exception is that the Zamboni Search will have higher efficiency (86.72%) than Hill Climbing (67.72%) or RHC (83.01%) when \(T=700\) s. The advantage of GMM–RHC is especially highlighted when the duration time T is small, meaning that GMM–RHC will guide UAVs to visit the subregions with high probability primarily. The performance of running time will be opposite to that of efficiency: Random Search, Zamboni Search, or Hill Climbing spends very little time; the time by RHC, Voronoi-RHC or GMM–RHC will be much longer, and GMM–RHC needs the most running time. However, the running time of GMM–RHC is still acceptable considering the total flight time.

4.4 Statistical results of different methods in 20 tests

By utilizing the abovementioned six methods, 20 Monte Carlo experiments with duration time \(T=300\) s are executed, where the target probability distribution map and the topographic information are both generated randomly in each test. The comparison of searching efficiency is as shown in Fig. 12. The description of statistical results is the same as that of Sect. 4.3, and the proposed GMM–RHC method has the largest searching efficiency in most cases. Hence the GMM–RHC method has strong robustness in various scenarios.

Fig. 12
figure 12

Efficiency of different methods in 20 tests (\(T=300\) s)

5 Conclusion

The cooperative searching problem by multi-UAVs in complex environment is studied in this paper. Based on the prior information of the target probability distribution map, the searching subregions can be extracted, prioritized and then allocated to UAVs based on GMM method. The RHC-based concurrent method is then presented to obtain the coverage route of single UAV. Overall, with the three-layer structure of GMM–RHC method, the cooperative problem can be simplified as a single-UAV planning problem. Besides, the various constraints including obstacle avoidance, collision avoidance, and simultaneous arrival at given destination can easily be satisfied. More importantly, with this hybrid method UAVs will prefer to visit the areas with higher value of target probability. Hence it enables maximizing the searching payoff during the limited flight duration time and avoids the local optimum problem effectively. Experiment results show that although the running time by GMM–RHC is longer than other methods, the searching efficiency will be greatly enhanced. The future work will focus on performing the algorithm in the real UAV platform.