1 Introduction

Recently, inexpensive mobile autonomous agents, such as unmanned aerial vehicles (UAVs), have received increasing interest for remote sensing [1], disaster monitoring [2], emergency assistance [3], relay connectivity [4] and border surveilance [5]. Although single-UAV systems have been used for decades, the overall capability can be extended by multiple UAVs. In multi-UAV systems, failure of a single UAV causes the network to reorganize and to maintain communication through other nodes. This would not be possible in single-UAV systems [6]. In terms of communication needs, single-UAV systems are linked and restricted to the coverage of the ground station. While with a team of UAVs by establishing an ad hoc network, it extends the scope of system. In this way, the UAVs can communicate with each other and only a subset of them need to connect with the ground station [7]. More importantly, the missions can be completed faster with collaboration of multiple UAVs. At the same time, it’s able to bring increased challenges to coordinate multi-UAV systems and high requirements to the communication network.

In multi-UAV systems, the state information and sensor data of UAVs need to be transmitted to the ground station. Furthermore, UAVs need to communicate with each other frequently in order to exchange information about the coordination and the observations of the world. However, they are constrained by limited energy ability, storage capacity and communication bandwidth. UAVs can’t operate without fuel or battery replacement for a long time. Recently, some works have been explored to find different ways to extend the lifetime of network, using optimization [8] and path planning algorithm [9]. Therefore, there is a need to design a communication strategy by which the energy can be used efficiently.

Many works use UAVs as aerial base stations that provide a high chance of Line-of-Sight (LoS) links to ground users. The performance analysis of a flying base station by stochastic geometry is studied [10]. In order to effectively deploy UAVs, a novel algorithm based on the machine learning framework is proposed to maximizing the users’ quality of experience using the minimum total transmit power [11]. For many applications, some UAVs have to collect and disseminate their information to the ground station that may be located far from the operational area. Since wireless channels decay with line-of-sight obstacles or increasing distance, the inability to transmit sensing data to the ground station in real time may render the multi-UAV systems ineffective [12]. Furthermore, the poor communication may generate potentially dangerous system and prevent agreement in the team. A solution is to consider using UAVs as relays for failed links which makes the routing become a crucial problem. Nowadays, there is a myriad of literature on routing protocols for multi-UAV systems. For instance, the position-based routing protocols [13, 14] and the hierarchical routing protocols [15,16,17,18,19]. However, all these research works only care about the network connectivity without taking into consideration the mission of UAVs. In fact, the UAVs in general are mission-based and their mobility models depend on the tasks they plan to accomplish. Therefore, it needs to design a proper mission-based scheme to form the links for UAVs which are far away to the ground station.

Multi-UAV systems, also called flying ad hoc networks (FANETs) [7], are a special case of mobile ad hoc networks (MANETs) with a high degree of mobility. The topology of these networks can change more frequently and the UAVs might locate out of the scope of ground station. This makes the centralized approach [8] that requires the ground station to run the algorithm at each time instant become not applicable. In general, by a distributed algorithm [12, 20], the network could control itself to support the connections between nodes. Therefore, the need for developing a distributed solution is desirable.

However, nodes are generally selfish in wireless network. Individuals may have different levels of interest if they are “designed, owned, or operated by several organisations that may have different goals” [21]. Hence, a network formation method considering global connectivity is needed. Game theory, as an important tool, is widely used in many fields [22]. It can provide a good framework for modelling and analyzing the interactions among multiple players that choose their own strategy independently according to their local information. A network formation game is formulated to achieve small base stations backhauling by designing a UAV-based multi-hop network [23]. The paths connecting the small base stations and gateway can be distributed exploited by studying a pairwise stability at the expense of negotiation to solve the network formation game. But this scheme had normal probability to get trapped in cycle networks.

In this paper, we consider the UAVs execute reconnaissance tasks where the sensing information needs to be transmitted to ground station. The UAVs need to autonomously learn how to form air-to-air (A2A) and air-to-ground (A2G) links. The main contributions of this paper are:

  • We propose a system model which adopts a metric that jointly consider achievable rate, delay and energy consumption. Due to the wireless channels decay and limited energy, a self-organizing and multi-hop network is established.

  • The game theory is used to study the interactions among UAVs to get the network structure connecting the UAVs to their serving ground station. This allows that every UAV is interested in increasing its individual utility and making a decision based on the current network topology.

  • A distributed myopic Nash network formation algorithm is proposed which is a hybrid of the pure-strategy and the mixed-strategy Nash network formation algorithm in order to avoid getting cycle networks.

The rest of this paper is organized as follows: In Sect. II, preliminaries and system model are given. Section III details the game and presents the proposed myopic Nash network formation algorithm. The performance evaluation is analyzed in Sect. IV. Finally, conclusions are drawn in Sect. V.

2 Preliminaries and system model

Consider a scenario where a team of N UAVs is given a mission to be static to survey areas of interest and send the data back to the ground station. The sensing information generates from the onboard radar and camera or ground sensors where the UAVs may be static to collect. However, the sensing areas are hard to reach by ground vehicles or located at distant. Along with the communication distance increases, the received signal quality deteriorates. As a result, the network that connects the sensing UAVs directly to the ground station is either unavailable or limited in data transmission. To address the issue, it’s necessary to extend the communication range based on a multi-hop aerial network to sense the areas. Here, cooperative communication among UAVs is adopted, where the role of UAVs, as intermediate relays, is used to serve different transceivers. Therefore, a tree network with the ground station as the root node is formed. The UAVs in the network transmit the sensing data packets to the ground station through one or more hops in the formed tree.

Fig. 1
figure 1

An example of the tree model

In our model, there are some assumptions. First of all, at least one UAV has access to the ground station. Due to the restriction of the camera resolution, UAVs are limited to a few hundred meters as low-altitude platform (LAP). For the air-to-ground (A2G) and air-to-air (A2A) links, the Doppler effect due to the mobility of UAVs is assumed to be perfectly compensated. Figure 1 illustrates an example of our tree model. Every sensing UAV i in the network senses its own area of interest based on the task allocation. The information is generated at a constant rate equal to \(\lambda _i\) packets per second. Moreover, UAV i also receives the packets from other UAVs that are connected to it. The role of UAV i is to relay these packets to the next hop.

2.1 Achievable rate

The wireless channel follows the free-space path loss model, expressed as [24]:

$$\begin{aligned}&{\xi (dB)} = \displaystyle 10{\alpha }\log \left( {\frac{{4\pi {f_c}{d_{o,d}}}}{c}} \right) , \end{aligned}$$
(1)

where \(\alpha \ge 2\) is the path loss exponent, \(f_c\) is the system carrier frequency, \(d_{o,d}\) is the distance between the origin node o with the destination node d, c is the speed of light.

For modeling the A2G propagation channel, one common approach is to consider a probabilistic line-of-sight (LoS) and non-line-of-sight (NLoS) links [24]. Note that for NLoS links due to the shadowing and diffraction loss, attenuations is higher than the LoS links. Hence, the adopted path loss between UAV i and the ground station g can be given by [23]:

$$\begin{aligned}&{{L}_{i,g}}=\left\{ \begin{matrix} \displaystyle {{\xi }_{i,g}}+{{\psi }_{LoS }}, & \quad \text {LoS\,link,} \\ \displaystyle {{\xi }_{i,g}}+{{\psi }_{NLoS }}, & \quad \text {NLoS\, link,} \\ \end{matrix} \right. \end{aligned}$$
(2)

where \({\psi }_{LoS }\) and \({\psi }_{NLoS }\) are the additional attenuation factor caused by shadow fading. Here, the probability of LoS connection depends on the environment, the location of the ground station and the UAV, and the elevation angle between them. The LoS probability is given by [25]:

$$\begin{aligned}&P_{i,g}^{LoS }=\displaystyle \frac{1}{1+C\exp (-B[\theta -C])}, \end{aligned}$$
(3)

where C and B are constant values which depend on the environment and \(\theta \) is the elevation angle. Clearly, \(\theta =sin ^{-1}\displaystyle \left( \frac{h_{i,g}}{d_{i,g}}\right) \), \(h_{i,g}\) and \(d_{i,g}\) are the height difference and distance between UAV i and the ground station g. Here, the probability of NLoS is \(P_{i,g}^{NLoS }=\displaystyle 1-P_{i,g}^{LoS }\). Therefore, the average path loss is [25]:

$$\begin{aligned}&{\bar{L}_{i,g}}=\displaystyle P_{i,g}^{LoS }L_{i,g}^{LoS }+P_{i,g}^{NLoS }L_{i,g}^{NLoS }. \end{aligned}$$
(4)

The average signal-to-noise ratio (SNR) expression between UAV i and the ground station g is:

$$\begin{aligned}&{{r}_{i,g}}=\displaystyle \frac{{{P}_{i,g}}}{{{10}^{{{{\bar{L}}}_{i,g}}/10}}\cdot {{\sigma }^{2}}}, \end{aligned}$$
(5)

where \({P}_{i,g}\) is the transmit power of UAV i, \({\sigma }^{2}\) is the noise power. Therefore, the achievable data rate can be defined as:

$$\begin{aligned}&{{R}_{i,g}}=\displaystyle {B}_{i}{{\log }_{2}}(1+{{r}_{i,g}}), \end{aligned}$$
(6)

where \(B_i\) is the transmission bandwidth of UAV i.

For modeling the A2A propagation channel, different UAVs can communicate through LoS links due to the open environment. Therefore, the path loss between UAV i and UAV j can be given by [23]:

$$\begin{aligned}&{{L}_{i,j}}=\displaystyle {{\xi }_{i,j}}+{{\psi }_{LoS }}. \end{aligned}$$
(7)

Consider orthogonal channel among UAVs, the SNR is:

$$\begin{aligned}&{{r}_{i,j}}=\displaystyle \frac{{{P}_{i,j}}}{{{10}^{{{{L}}_{i,j}}/10}}\cdot {{\sigma }^{2}}}, \end{aligned}$$
(8)

where \({P}_{i,j}\) is the transmit power of UAV i to the destination UAV j. Therefore, the achievable data rate is:

$$\begin{aligned}&{{R}_{i,j}}=\displaystyle {B}_{i}{{\log }_{2}}(1+{{r}_{i,j}}), \end{aligned}$$
(9)

Here, a directed graph is defined as \(G(\mathcal {V},\mathcal {E})\) where \(\mathcal {V}={1,2,\ldots ,N+1}\) represents the set of vertices (N UAVs and the ground station g) and \(\mathcal {E}\) denotes the set of edges that connect different nodes. A directed edge \((i,j)\in \mathcal {E}\) in graph G means that the sensing traffic flow from UAV i to j. Given a network graph \(G(\mathcal {V},\mathcal {E})\), the path \({q}_{i}\) from UAV i to ground station g is defined as:

Definition 1

The links among a sequence of nodes \(i_1,\ldots ,i_K\) (in \(\mathcal {V}\)) form the path \(q_i\), where \(i_1=i\), \(i_K=g\) and each directed link \((i_k,i_{k+1})\in G\) for each \(k\in {1,\ldots ,K-1}\).

Hence, the achievable end-to-end rate from UAV i to ground station along a path \({q}_{i}\) is the minimum of the rates achievable over \(K-1\) hops [26]:

$$\begin{aligned}&{{R}_{i,{{q}_{i}}}}(G) = \underset{k=1,\ldots ,K-1}{\mathop {\min }}\,{{R}_{{{i}_{k}},{{i}_{k+1}}}} \end{aligned}$$
(10)

where \({{R}_{{{i}_{k}},{{i}_{k+1}}}}\) is the rate of link \(({{i}_{k}},{{i}_{k+1}})\). K is the number of vertices from UAV i to ground station along path \(q_i\).

2.2 Delay cost

Consider a decode-and-forward relaying scheme, each intermediate UAV decodes and re-encodes the received signal before sending it. However, it’s a very complicated work to express the delay exactly. So by the Kleinrock approximation with each UAV as a M/D/1 queueing system, the average delay over the path \({q}_{i}\) from UAV i to the ground station [20]:

$$\begin{aligned} {{{{\tau }_{i,q_i}}(G)}}=\sum \limits _{({{i}_{k}},{{i}_{k+1}})\in {{q}_{i}}}{\left( \frac{{{\varphi }_{{{i}_{k}},{{i}_{k+1}}}}}{2{{\mu }_{{{i}_{k}},{{i}_{k+1}}}}({{\mu }_{{{i}_{k}},{{i}_{k+1}}}}-{{\varphi }_{{{i}_{k}},{{i}_{k+1}}}})}+\frac{1}{{{\mu }_{{{i}_{k}},{{i}_{k+1}}}}} \right) }, \end{aligned}$$
(11)

where \({\varphi }_{{{i}_{k}},{{i}_{k+1}}}=\lambda _{i_k}+\Lambda _{i_k}\) is the total packet arrival rate (packet/s) traversing link \({({{i}_{k}},{{i}_{k+1}})\in {{q}_{i}}}\) between UAV \({i}_{k}\) and UAV \({i}_{k+1}\) and generating from the sensing information of UAV \({i}_{k}\) and from all other UAVs that have a link with \({i}_{k}\). \({\Lambda }_{i_k}\) is defined as \({{\Lambda }_{i_k}}=\sum \nolimits _{j\in {{A}_{i_k}}}{{{\lambda }_{j}}}\) where \({A}_{i_k}\) is the set of UAVs that are connected to UAV \(i_k\) in current network graph G. \({\mu }_{{{i}_{k}},{{i}_{k+1}}}={R}_{{{i}_{k}},{{i}_{k+1}}}/\nu \) is the service rate where \({{R}_{i_k,i_{k+1}}}\) is the achievable data rate of the direct transmission between UAV \({{i}_{k}}\) and UAV \({{i}_{k+1}}\) and \(\nu \) is the packet length. Here, the link \(({{i}_{k}},{{i}_{k+1}})\) is formed successfully only if the arrival rate \({\varphi }_{{{i}_{k}},{{i}_{k+1}}}\) is less than the service rate \({\mu }_{{{i}_{k}},{{i}_{k+1}}}\). Otherwise, the delay will be infinite.

2.3 Energy cost

UAVs are constrained by limited energy, and the energy consumption is mainly dominated by the communication cost [8]. In this paper, a commonly energy consumption model is considered with the sensing, processing and communication units. The extension to the storage energy consumption will be left as our future work. The first item is the sensing energy used to sense environment. The sensing energy consumed by UAV i within per unit time is:

$$\begin{aligned}&\displaystyle E_{i}^{s}=\displaystyle {{e}_{s}}\nu {{\lambda }_{i}},&\forall i\in N, \end{aligned}$$
(12)

where \({{e}_{s}}\) is the energy to sense one bit of information.

The second energy item is the data processing costs, which is the energy to decode and re-encode the received signal. The processing energy consumed for the information of UAV i is:

$$\begin{aligned}&\displaystyle E_{i}^{p}=\displaystyle \sum \limits _{m=1}^{M}{{e}_{p}}\nu {{\lambda }_{i}},&\forall i\in N, \end{aligned}$$
(13)

where \({e}_{p}\) is the energy to process one bit of information.

Another energy item is the communication costs, which includes the consumption of transmitting energy and receiving energy. The communication energy consumed for the information of UAV i is:

$$\begin{aligned}&\displaystyle E_{i}^{c}=\displaystyle \sum \limits _{m=1}^{M}{({{P}_{tm}}+{{P}_{rm}})}T_m,&\forall i\in N, \end{aligned}$$
(14)

where \(T_m={{R}_{{{i}_{k}},{{i}_{k+1}}}}/(\nu {{\lambda }_{i}})\) is the communication time between UAV \(i_k\) and UAV \(i_{k+1}\) for the information of UAV i. \({{P}_{tm}}\) and \({{P}_{rm}}\) is the power of transmitting and receiving information.

Therefore, the total energy consumed for the information of UAV i along path \(q_i\) by sensing, processing and communication is denoted by:

$$\begin{aligned}&\displaystyle {{{{E}_{i,q_i}}(G)}}=E_{i}^{s}+E_{i}^{p}+E_{i}^{c},&\forall i\in N. \end{aligned}$$
(15)

To evaluate the performance of the network, it is important to define proper metrics. Next, the research problem and its solving algorithm are presented.

2.4 Problem formulation

Each UAV in the tree model shown in Fig. 1 wants to choose a path to the ground station. So it needs an objective function to assist making decision that takes into account the key metrics such as the achievable rate, delay and energy consumption which can have an impact on the network performance. A suitable criterion, the utility of UAV i, is defined as:

$$\begin{aligned}&{{U_{i}(G)=\displaystyle \frac{R_{i,q_i}^{\delta _i}}{{\tau _{i,q_i}^{\eta _i}}{E_{i,q_i}^{1-\delta _i-\eta _i}}}, }}&\forall i\in N. \end{aligned}$$
(16)

where \(\delta _i\) and \(\eta _i\) are the objective weights. \(U_{i}\) depends on the sensing traffic and the route to ground station.

3 Network formation game

In this section, the proposed problem is modeled as a network formation game and a distributed algorithm is provided to achieve sensing data transmission from UAVs to ground station through the tree network structure that results from the interactions among UAVs.

3.1 Game formulation

Refer to [27], we study the analytical framework of network formation games and model our problem as a network formation game \(\mathbf G(\mathbf N,\mathbf S,\mathbf U)\) where each UAVs, as independent decision player, takes action to optimize its own utility. Here, the main components \(\mathbf N,\mathbf S,\mathbf U\) are the players, their strategies, and their utility functions respectively. The player set for the proposed network formation game contains N UAVs defined as follows:

$$\begin{aligned}&\displaystyle \mathbf N=\{{{UAV }_{i}};\forall i\in (1,2,\ldots ,N)\}, \end{aligned}$$
(17)

where \(UAV _i\) represents the UAV with index i.

In this game, each UAV i interacts with others in order to take an appropriate path that connects it to the ground station. The action space of UAV i consists of the nodes (UAVs and ground station) that can use as its next hop. UAV i cannot choose the UAVs which is already connected to UAV i in the current network graph. Therefore, the strategies of UAV i is defined as [20]:

$$\begin{aligned}&{\mathbf {S}_{i}}=\displaystyle \{(i,j)|j\in \mathcal {V}\backslash (\{i\}\cup {\mathbf {A}_{i}})\}, \end{aligned}$$
(18)

where \({\mathbf {A}_{i}}=\displaystyle \{j\in \mathcal {V}\backslash \{i\}|(j,i)\in G\}\) is the set of UAVs which already have a link with UAV i. The aim of UAV i is to select the link \({{s}_{i}}\in {\mathbf {S}_{i}}\) that it wants to form. Here, each UAV has its utility function dependent on its own strategy and other UAVs’ strategies. \(U_{i}(G)\) is the utility function of UAV i which is given in (16).

In the network graph, each UAV is connected to the ground station through at most one path. This means UAV i will keep its previous strategy or replace its previous strategy with the new link \({{s}_{i}}\). The replace operation is highly dependent on the utility function of UAV i. Obviously, if UAV i has no strategy to choose, there is no path exists between UAV i and the ground station g, whether direct or multi-hop. As a result, the UAV i will be disconnected from the network and the utility of i will be zero. Therefore, there is no incentive for the UAVs to disconnect from the ground station. The network graph will always be a connected tree structure rooted at the ground station.

3.2 Pure-strategy Nash network formation algorithm

The main objective is to find an algorithm that can allow the UAVs to interact to form the tree network structure. It is desired that a UAV i plays its strategy \(s_{i}\in {\mathbf S_{i}}\) given other UAVs’ strategies \(\mathbf s_{-i}\), which is defined as [28]:

$$\begin{aligned}&\displaystyle \mathbf s_{-i}=\{s_{1},\ldots ,s_{i-1},s_{i+1},\ldots ,s_{N}\}. \end{aligned}$$
(19)

Then the network graph can be expressed as \({{G}_{s_i,\mathbf s_{-i}}}\). In game theory, Nash equilibrium (NE) [22] is a stable state that the players arrive an agreement. When all the players act according to their best response (BR), the game will converge to a state of Nash equilibrium. Here, the best response of UAV i is defined as:

Definition 2

Strategy \(s_{i}^{*} \in BR(\mathbf s_{-i})\) is a best response of UAV i if

$$\begin{aligned}&\displaystyle {{{{U}_{i}}(G_{{{s_{i}^{*},\mathbf s_{-i}^{{}}}}})\ge {{U}_{i}}(G_{{{s_{i}^{{}},\mathbf s_{-i}^{{}}}}})}},&\forall s_{i}\in {{{\mathbf S}}_{i}}, \end{aligned}$$
(20)

where \(BR(\mathbf s_{-i})\) is the set of best response of UAV i. It means that UAV i selects the link to form that maximizes its utility given other UAVs’ strategies \(\mathbf s_{-i}\). A myopic pure-strategy network formation algorithm is built based on the best response. In particular, myopic UAVs improve its utility considering only the current state of the network without taking into account the future evolution of the network. At each iterative of the algorithm, the UAVs make their decisions based on (20) in a random but sequential order. This iterative process continues until it converges.

Now, let \({\mathbf s}=(s_{1},\ldots ,s_{N})\) be defined as the strategy profile. When the algorithm converges, NE strategy profile is achieved, which is defined as:

Definition 3

\({\mathbf s}\) is a NE strategy profile if

$$\begin{aligned}&\displaystyle s_i \in BR(\mathbf s_{-i}),&\forall i\in \{1,2,\ldots ,N\}. \end{aligned}$$
(21)

The NE strategy profile guarantees to reach a stable network where there is no UAV has a motive to unilateral deviate from its strategy. That is to say no UAV can improve its utility by changing its current link if other UAVs do not deviate their strategies. Then, a pure-strategy Nash network is achieved.

3.3 Mixted-strategy Nash network formation algorithm

Obviously, there are two results during the solution of pure-strategy Nash network formation algorithm based on best response. One is the pure strategy for each UAV in which probability one is assigned to one strategy and zero to the others. However, the Nash equilibrium (NE) in pure strategies may not always exist. Another result is that some UAVs get stuck in cycle networks. An alternative version of the best response is to exploit the mixed strategies, which operates on probability distributions over strategies [20]. Therefore, let every UAV record the network topology during its turn. If a UAV finds there exists a network that has been visited, the mixed-strategy Nash network formation algorithm will be triggered. In this paper, the UAVs will use fictitious play to find the mixed Nash equilibrium based on the observation. The mixed strategy of UAV i is a distribution that assigns a probability \(p_{i}{(s)}\) to each action s and \(\sum \limits _{s\in {{\mathbf {S}}_{i}}}{{{p}_{i}}(s)}=1\).

Refer to [29], a myopic mixed-strategy network formation algorithm is built based on the fictitious play of UAVs. Similar to the pure-strategy Nash network formation algorithm, at each iterative of the algorithm, the UAVs make their decisions in a random but sequential order. The action of UAV i at time \(t+1\) is the best response to the observed strategies of others [22], can be given by \(s=\arg {{\max }_{s\in {{\mathbf {S}}_{i}}}}{{g}_{i}}(s,{{\mathbf {p}}_{-i}})\), where

$$\begin{aligned}&{{g}_{i}}(s,{{\mathbf {p}}_{-i}})=\sum \limits _{{{s}_{1}}\in {{\mathbf {S}}_{1}}}{\cdots }\sum \limits _{{{s}_{i-1}}\in {{\mathbf {S}}_{i-1}}}{\sum \limits _{{{s}_{i+1}}\in {{\mathbf {S}}_{i+1}}}{\cdots }\sum \limits _{{{s}_{N}}\in {{\mathbf {S}}_{N}}}{{}}}\nonumber \\&{{p}_{1}}({{s}_{1}}),\cdots ,{{p}_{i-1}}({{s}_{i-1}}),{{p}_{i+1}}({{s}_{i+1}}),\cdots ,{{p}_{N}}({{s}_{N}}){{U_{i}^{t+1}(G_{{{s,\mathbf s_{-i}^{{}}}}})}} \end{aligned}$$
(22)

The UAVs update the probabilities of strategies based on the value at last time and the current action as follows:

$$\begin{aligned}&{\mathbf {p}}_{i}^{t+1}=\displaystyle {\mathbf {p}}_{i}^{t}+\frac{1}{t+1}\left( {\mathbf {v}}_{i}^{t+1}-{\mathbf {p}}_{i}^{t} \right) , \end{aligned}$$
(23)

where \(\mathbf {p}_{i}^{t}={{[p_{i}^{t}(s)]}_{s\in {\mathbf {S}_{i}}}}\) is the empirical probability that UAV i chooses the action s at the t-th iteration of the algorithm. \(\mathbf {v}_{i}^{t+1}={{[v_{i}^{t+1}(s)]}_{s\in {\mathbf {S}_{i}}}}\) is an \({\mathbf S_i}\)-dimensional vector with \(v_{i}^{t+1}(s)=1\) if UAV i chooses the action s at time \(t+1\), and \(v_{i}^{t+1}(s)=0\), otherwise.

Each UAV i selects its best response action according to the observed decisions of the other UAVs. Hence, at any given time, it ensures to form a tree architecture. This iterative process continues until it converges to the mixed NE which is defined as:

figure f

Definition 4

A mixed strategy NE of the game is a mixed strategy profile \({{\mathbf {p}^{*}}}=(\mathbf {p_{1}^{*}},\mathbf {p_{2}^{*}},\ldots ,\mathbf {p_{N}^{*}})=(\mathbf p_{i}^{*},\mathbf p_{-i}^{*})\) such that [22]

$$\begin{aligned}&\displaystyle {{\tilde{U}}_{i}}({\mathbf {p}}_{i}^{*},{\mathbf{p}}_{-i}^{*})\ge \displaystyle {{\tilde{U}}_{i}}(\mathbf p_{i}^{{}},\mathbf p_{-i}^{*}),&\forall i\in N, \end{aligned}$$
(24)

where

$$\begin{aligned}&{{{{\tilde{U}}_{i}}(\mathbf p_{i}^{{}},\mathbf p_{-i}^{{}})=\displaystyle E({{U}_{i}})=\sum \limits _{{\mathbf s}\in {\mathbf {S}}}{(\underset{j\in N}{\mathop {\prod }}\,p_{{j}}{({s}_{j})})}{{U}_{i}}(G_{{{s,\mathbf s_{-i}^{{}}}}})}}. \end{aligned}$$
(25)

It’s the expected utility of UAV i when selecting the mixed strategy \(\mathbf p_{i}^{{}}\) and \({\mathbf {S}}=({\mathbf S_i},{\mathbf S_{-i}})\).

It’s difficult to have an analytical proof for the convergence of this algorithm. The unified convergence proofs have been provided that whenever fictitious play converges, it converges to a Nash equilibrium [30]. However, it’s common to use fictitious play to find the mixed-strategy Nash network even though no analytical proof [22].

A summary of the proposed algorithm is given in Algorithm 1. The algorithm consists three phases: Initial state, the proposed network formation algorithm, and information transmission. In the first phase, the starting network is formed. After that, the network formation algorithm begins to seek the Nash network among UAVs. Once the game reaches steady-state, the UAVs start their transmission.

4 Simulation results and analysis

For simulations, a 5 km \(\times \) 5 km square area with the ground station at (0, 0) is considered. The observed areas are assigned to each UAV located at the altitude of 100 m. The path loss exponent is set to \(\alpha =2\), the system carrier frequency is set to \(f_c=2\ GHz \), the additional attenuation factors are set to \({\psi }_{LoS }=5\ {\text{dB}} \) and \({\psi }_{NLoS }=20\ {\text{dB}} \), the values of C and B are set to 11.9 and 0.13 according to [23], and the noise variance \({{\sigma }^{2}}=-\,90\ {\text{dBm}}\). The UAVs are considered as having a transmit power of \(20\ {\text{dBm}} \), an arrival rate of 50 packets/s. All packets are considered of size 256 bits and the tradeoff parameters \(\delta _i=\delta =0.5\), \(\eta _i=\eta =0.3\). The energy parameters are \({{e}_{s}}=50\ {\text{nJ/bit}} \), \({{e}_{p}}=10\ {\text{nJ/bit}}\). The power consumption for receiving information is \(15\ {\text{dBm}} \).

Fig. 2
figure 2

A snapshot of a final tree structure resulting from the proposed algorithm for a network with \(N=10\) UAVs

In Fig. 2, we deploy \(N=10\) UAVs to observe or collect information to the ground station. The network formation game starts with the star topology with all the UAVs connected directly to the ground station. Figure 2 shows that a tree structure network results from our proposed algorithm. Most of the UAVs connect to the ground station by multi-hop. Moreover, not only the distance affects the decisions of UAVs, but also the number of hops does. For example, UAV 3 connects to UAV 5 directly, although UAV 1 is closer. The reason is that the path for UAV 3 along UAV 1 is more congested and thus decreases its utility. In brief, Fig. 2 shows how the UAVs select their strategies to form a tree topology.

Fig. 3
figure 3

The utility variation of relevant UAVs due to the movement of UAV 1 along the x-axis in the positive direction

In Fig. 3, the effect of task change on the network structure is presented. We consider the network of Fig. 2 and assume the interesting area of UAV 1 is changed. Then UAV 1 moves along the positive x-axis direction to reach the new task area while the other UAVs remain static. The utility variation of relevant UAVs are shown in Fig. 3. At the beginning of movement, the utility of UAV 7 which served by UAV 1 decreases, while the utility of UAV 1 increases since the distance between UAV 1 and UAV 5 decreases. After UAV 1 removes 0.4 km, UAV 3 finds its utility could improve by connecting with UAV 1. As the movement continues, UAV 7 decides to connect with UAV 5 instead of UAV 1 since this can provide a better utility. At the same time, the utility of UAV 1 starts to drop as it away from UAV 5. After moving 1.8  km, UAV 3 maximizes its utility by replacing its current connection with UAV 1 and establishing a link with UAV 5. In a word, the UAVs can make decisions to adapt the changing environment.

Fig. 4
figure 4

Frequency of switch operations per minute when all UAVs moving with different velocities in random directions

Figure 4 shows that the average number of switch operations when all UAVs moving with different velocities in random directions over a period of 5 min. In order to adapt to the changing environment, the proposed algorithm is repeated every 30 s. From Fig. 4, we can see that the average number of switch operations increases as the velocity of the UAVs increases. The reason is, as mobility increases, the possibility of changes in the network structure increases. As a result, the UAVs take actions more frequently. The greater the number of UAVs, the more the chances of replacing the current link, hence the case of \(N=20\) has a higher number of switch actions than the case of \(N=10\). In this regard, the average number of switch operations per minute varies from around 5.7 at 10 km/h to around 22 at 70 km/h for \(N=20\), while for \(N=10\), it’s from 1.6 at 10 km/h to around 9 at 70 km/h. In brief, Fig. 4 provides that the UAVs can adapt the changing network topology and make appropriate decisions through the proposed algorithm.

Fig. 5
figure 5

Minimum, average and maximum number of iterations till convergence

Table 1 The average number of iterations till convergence based on the proposed algorithm and that of the two-relay scheme

Figure 5 shows the minimum, average and maximum number of iterations needed till convergence of the proposed algorithm for different number of UAVs. Each UAV decides whether replacing the strategy during its turn in every iteration. This figure shows that, the total number of iterations required to get the stable network topology increases as the number of UAVs increases. The reason stems from the fact that the choices for every UAV increase, thus more iterations are required for the convergence. For instance, the maximum, average and minimum number of iterations vary from 3, 2.4, 2 at \(N=5\) UAVs up to 5, 3.6, 3 at \(N=25\) UAVs. Figure 5 demonstrates that, after a number of iterations, our proposed algorithm could converge and a stable topology is achieved. The algorithm could applicable to relatively large networks from the convergence speed. Moreover, Table 1 compares the average number of iterations required resulting from our proposed algorithm and the two-relay scheme. For the two-relay scheme, each UAV splits its traffic on average by choosing two relays to reach the destination. From Table 1, it can be seen that the proposed algorithm requires a lower iterations for different number of UAVs. This is due to the increase in the number of strategies on the two-relay scheme.

Fig. 6
figure 6

Performance assessment of the proposed network formation algorithm, in terms of average utility per UAV as compared to nearest neighbor algorithm and the direct transmission

Figure 6 shows the average achieved utility per UAV as the number of UAVs in the network increases. Here, the proposed algorithm is compared against the nearest neighbor algorithm which each UAVs selects its nearest neighbor to connect to, as well as the direct transmission scheme. In Fig. 6, as the number of UAVs increases, the average utility per UAV decreases. This is due to the fact that, the available bandwidth per UAV depends on the number of UAVs. The increasing number of UAV will result to the decreasing bandwidth and average rate of each. Another factor is the UAVs that having bad channel conditions with the ground station could form links with other UAVs to get better communication. The higher probability of LoS communication among UAVs brings more resistant to the attenuation. Figure 6 demonstrates that the proposed algorithm presents a significant performance gains over the other two schemes.

Fig. 7
figure 7

Average utility per UAV achieved by the proposed algorithm, nearest neighbor algorithm and direct transmission for a network with \(N=10\) UAVs as the tradeoff parameter \(\delta \) varies

In Fig. 7, the average utility per UAV achieved is presented for a network with \(N=10\) UAVs as the the tradeoff parameter \({\delta }\) varies. Here, same weight is put on the delay and energy consumption. The proposed algorithm is compared against the direct transmission and nearest neighbor algorithm which used frequently. In this figure, as the tradeoff parameter increases, the performance of all schemes improves and the proposed algorithm outperforms others. This is because the proportion of achievable rate in the utility is growing. For small \(\delta \), the UAVs are highly sensitive to delay and energy consumption and tend to form a link with ground station by small number of hops. As \(\delta \) increases, the network becomes more delay and energy consumption tolerant. This leads to the probability of using multi-hop transmission and the average achievable rate per UAV increases.

Fig. 8
figure 8

Mixed-strategy algorithm: a UAVs deployment; b convergence time

Figure 8 provides the convergence time of mixed-strategy Nash network formation algorithm based on fictitious play. Through a lot simulations, with the pure-strategy algorithm, cycle networks appear when UAVs are deployed as Fig. 8a. As a result, the mixed-strategy algorithm is triggered. Some UAVs, not all, will play their mixed strategies. Figure 8b shows the probabilities of different strategies of UAVs. We can see that the probabilities vary a lot at the beginning. And the algorithm gradually converges after several iterations.

In Fig. 9, we show the proposed algorithm may admit multiple equilibria. The total number of possible network trees is \((N+1)^{(N-1)}\) with a team of N UAVs. Therefore, for large networks, choosing the most efficient equilibrium is computationally intractable by finding all possible network trees. To simplify the simulation without losing generality, we consider \(N=3\) case and \(N=4\) case, with 16 and 125 possible network trees respectively. Figure 9a shows the number of Nash equilibria (only pure strategy), over about 10,000 random network settings. From the results, it can be observed that the maximum number of pure-strategy Nash networks is two, which is much less than the total number of possible network trees. Moreover, by assessing the network settings with multiple equilibria in \(N=4\) case, the average utility of different equilibria has been presented in Fig. 9b. The performance of different equilibria is similar. However, design mechanisms could improve the efficiency of equilibrium [22]. Thus, a method is provided in the following simulation.

Fig. 9
figure 9

Nash networks: a the number of Nash equilibria, over 10,000 random network settings, in \(N=3\) case and \(N=4\) case; b average utility of different equilibria by assessing the network settings with multiple equilibria in \(N=4\) case

Fig. 10
figure 10

Performance assessment in no incentive case and incentive case as the number of UAVs increases: a average utility; b average and average maximum number of hops

We consider a mechanism which gives a positive incentive to UAV by the number of packets it relays. The performance of incentive case and no incentive case is further assessed in Fig. 10 as the number of UAVs increases. From the results, it can be observed that, as the number of UAVs increases, the average utility of two cases is similar. However, in no incentive case, the average maximum and average number of hops vary, respectively, from 2.8 and 2 at \(N = 5\), up to 4.5 and 2.9 at \(N = 25\). While in incentive case, they vary from 2.6 and 1.9 at \(N = 5\), up to 4.3 and 2.8 at \(N = 25\). The number of hops in incentive case is a little lower than no incentive case. Consequently, through the incentive mechanism, the efficiency of equilibrium could be improved.

Fig. 11
figure 11

Performance assessment under different selflessness \(\epsilon \): a average utility; b number of hops

Figure 11 shows the performance assessment achieved under different selflessness \(\epsilon \), which any UAV is willing to accept a connection to relay the packets that does not decrease its utility by more than \(\epsilon \). From Fig. 11a and b, it can be observed that the average utility and the number of hops decrease as the level of selfishness increases. The performance of the proposed algorithm gets closer to the selflessness with 5%. Therefore, the utility of UAVs will not decrease a lot. For instance, the average number of hops varies, respectively, from 1.9 at \(N = 5\), up to 3.2 at \(N = 25\) in our proposed algorithm. While for selfless with 0.1% case, it varies from 1.8 at \(N = 5\), up to 3 at \(N = 25\).

5 Conclusion

Inspired by cooperative communication, considering the achievable rate, delay and energy consumption, we design a game framework for UAVs. To solve the game, the Nash network formation algorithm was proposed which consists of a pure-strategy Nash network formation algorithm where each UAV can take the best response to the observed strategies of others, and a mixed-strategy Nash network formation algorithm where each UAV assigns the probability to its actions. In this way, the UAVs can autonomously learn to form the A2A and A2G links. By our simulations, we show that the UAVs can replace their strategies appropriately. The results also demonstrated that the proposed algorithm achieves notable gains compared with other approach.