Cognitive Radio Networks for Delay-Sensitive Applications: Games and Learning

Xiao, Yuanzhang; der Schaar, Mihaela van

doi:10.1007/978-981-10-1394-2_28

Yuanzhang Xiao² &
Mihaela van der Schaar^3,4

1039 Accesses

Abstract

We have witnessed an explosion in wireless video traffic in recent years. Video applications are bandwidth intensive and delay sensitive and hence require efficient utilization of spectrum resources. Born to utilize wireless spectrum more efficiently, cognitive radio networks are promising candidates for deployment of wireless video applications. In this chapter, we introduce our recent advances in foresighted resource allocation mechanisms that enable multiuser wireless video applications over cognitive radio networks. The introduced resource allocation mechanisms are foresighted, in the sense that they optimize the long-term video quality of the wireless users. Due to the temporal coupling of delay-sensitive video applications, such foresighted mechanisms outperform mechanisms that maximize the short-term video quality. Moreover, the introduced resource allocation mechanisms allow wireless users to optimize while learning the unknown dynamics in the environment (e.g., incoming traffic, primary user activities). Finally, we introduce variations of the mechanisms that are suitable for networks with self-interested users. These mechanisms ensure efficient video resource allocation even when the users are self-interested and aim to maximize their individual video quality. The foresighted resource allocation mechanisms introduced in this chapter are built upon our theoretical advances in multiuser Markov decision processes, reinforcement learning, and dynamic mechanism design.

Access provided by Autonomous University of Puebla. Download reference work entry PDF

Cognitive Radio Networks for Delay-Sensitive Applications: Games and Learning

A survey of dynamic spectrum allocation based on reinforcement learning algorithms in cognitive radio networks

Article 22 June 2018

Cognitive Network Management Framework and Approach for Video Streaming Optimization in Heterogeneous Networks

Article 18 March 2015

Keywords

Introduction

Video applications, such as video streaming, videoconferencing, remote teaching, and surveillance, have become the major applications deployed over wireless networks. Video applications are bandwidth intensive and delay sensitive and hence require efficient allocation of spectrum resources among the users and efficient scheduling of each user’s video packets based on its allocated resources.

One promising physical-layer technology to improve the spectrum efficiency is cognitive radio. In cognitive radio, secondary users (SUs) can utilize the idle spectrum when primary users (PUs) are inactive. Due to the potential of high spectrum efficiency, cognitive radio is a promising candidate for the physical-layer technology of video applications. Although cognitive radio is promising, there are several challenges in efficiently deploy video applications over cognitive radio networks.

First, video applications are delay-sensitive, namely, the video packets have to be received before strict deadlines for successful decoding. Therefore, video applications require not only high spectrum efficiency but also efficient scheduling of video packets.

In addition, there is interdependency across video packets, which makes the issue of delay sensitivity more challenging. Specifically, the successful decoding of some video packets may depend on the successful decoding of others. In other word, even if a video packet is received before its deadline, it may not be decoded if some packets that it depends on were not received by deadlines. This interdependency results in temporal coupling of packet scheduling decisions. Therefore, we need foresighted video packet scheduling that aims at maximizing the long-term video quality, instead of instantaneous video quality.

Moreover, the delay sensitivity and interdependency mentioned above require not only efficient packet scheduling but also efficient allocation of spectrum resources. More specifically, we need foresighted resource allocation among the users in the network, in order to maximize the long-term video quality.

Furthermore, the unknown dynamic environment requires users to make resource allocation and packet scheduling decisions while learning the unknown dynamics. Here the unknown dynamics include incoming video traffic, the channel quality, and the PU activities.

Finally, the users in cognitive radio networks are autonomous and may aim to maximize their own video quality (i.e., they are self-interested). It is more challenging to design efficient resource allocation mechanism for self-interested users, because they may misreport their information (e.g., their video traffic and channel quality). In this case, the resource allocation mechanism has to be strategy proof.

In this chapter, we introduce our solutions to the aforementioned challenges in multiuser wireless video transmission over cognitive radio networks. Specifically, we introduce a framework to design foresighted resource allocation mechanisms and packet scheduling algorithms [1,2,3,4,5,6,7]. The introduced resource allocation mechanisms allow wireless users to optimize while learning the unknown dynamics in the environment (e.g., incoming traffic, channel quality, primary user activities). We also describe variations of the mechanisms that are suitable for self-interested users [1,2,3,4,5]. These mechanisms ensure efficient video resource allocation even when the users are self-interested and aim to maximize their individual video quality. The foresighted resource allocation mechanisms introduced in this chapter are built upon our theoretical advances in multiuser Markov decision processes, reinforcement learning , and dynamic mechanism design.

The rest of this chapter is organized as follows. We give a literature review in section “Related Work.” We introduce the system model in section “General Model for Video Applications over Cognitive Radio” and then formulate the design problem in section “The Design Problem.” We describe the introduced solutions in section “Optimal Foresighted Video Transmission.” We also briefly describe the strategy-proof variations of the solutions in section “Strategy-Proof Resource Allocation Mechanisms.” Finally, we conclude the chapter in section “Conclusion and Future Directions.”

Related Work

Multimedia Applications over Cognitive Radio Networks

A plethora of recent works [1,2,3,4,5,6,7,8,9,10,11,12,13,14] propose distinct solutions to optimize the video quality. Some works [8,9,10,11,12,13,14,15,16] assume that the users are myopic, namely, they only maximize their instantaneous video quality over a given time interval without considering the impact of their actions on the long-term video quality. They cast the problem in a network utility maximization (NUM) framework to maximize the instantaneous joint video quality of all the users and apply the NUM framework repeatedly when the channel conditions or video traffic characteristics change. However, since the users are optimizing their transmission decisions myopically, their long-term average performance is inferior to the performance achieved when the users are foresighted [17]. Some of the works considering the foresighted decision of users focus solely on a single foresighted user making sequential transmission decisions (e.g., packet scheduling, retransmissions, etc.) [17]. However, these single-user solutions do not discuss how to allocate resources among multiple users as well as how this allocation is impacted by and impacts the foresighted scheduling decisions of individual users. Static allocations of resources, which are often assumed in the works studying the foresighted decisions of a single user, have been shown to be suboptimal compared to the solutions that dynamically allocate resources among multiple users [6].

In contrast, our introduced solutions, based on [1,2,3,4,5,6,7], make foresighted resource allocation and packet scheduling decisions over multiple video users.

Table 1 summarizes the above discussions. Note that the optimality shown in the last column of Table 1 indicates whether the solution is optimal for the long-term network utility (i.e., the joint long-term video quality of all the users in the network).

Table 1 Related works on video (the first two rows are works introduced in this chapter)

Full size table

Last but not the least, there are a plethora of works that address other important issues in cognitive radio networks with multimedia applications, such as spectrum handoff [18], routing [19], spectrum sensing [20], and applications in smart grid [21]. These works are out of scope of this chapter .

Theoretical Frameworks

Single-user foresighted decision-making in a dynamically changing environment has been studied and formulated as Markov decision process (MDP). Foresighted decision-making in a dynamically changing environment can also be solved using the Lyapunov optimization framework [22]. However, the Lyapunov optimization framework is not able to make optimal decisions for video streaming since it disregards specific interdependency and distortion impact of video traffic [23].

Table 2 summarizes the above discussions about existing theoretical frameworks.

Table 2 Related theoretical frameworks (the first row is works introduced in this chapter)

Full size table

General Model for Video Applications over Cognitive Radio

We first present a general model for multiuser wireless video transmission in cognitive radio networks. Then we give an example of a commonly used model as in [6, 14, 17] as an instantiation of our general model.

The General Model

We consider a cognitive radio network with a network manager indexed by 0 and a set $\mathcal {I}$ of I wireless video users, indexed by i = 1, …, I. Time is slotted at t = 0, 1, 2, …. In the rest of the paper, we will put the user index in the superscript and the time index in the subscript of variables. The multiuser wireless video transmission system is described by the following features.

States

Each user i has a finite state space S ⁱ, from which a state s ⁱ is realized and revealed to user i at the beginning of each time slot. The state s ⁱ may consist of several components, such as the video traffic state and the channel state. An example of a simplified video traffic state can be the types of video frames (I, P, or, B frame) available for transmission and the numbers of packets in each available video frame. Note that the video traffic in our model can come from video sequences that are either encoded in real time, or offline, and stored in the memory before the transmission. An example channel state can be the channel quality reported to the application layer by the lower layers. The network manager has a finite state space S ⁰ that describes the status of the resource in the network. An example resource state can be the available idle bandwidth.

Actions

At each state s ⁱ, each user i chooses an action a ⁱ ∈ A ⁱ(s ⁱ). For example, an action can be how many packets within each available video frame should be transmitted. We allow the sets of actions taken under different states to be different, in order to incorporate the minimum video quality requirements that will be discussed.

Payoffs

Each user i has a payoff function $u^i: S^i \times A^i \rightarrow \mathbb {R}$. The payoff function u ⁱ is concave in the action under any state. A typical payoff can be the distortion impact of the transmitted packets minus the cost of energy consumption in transmission.

State Transition

Each user i’s state transition is Markovian and can be denoted by p ⁱ(s ^i′|s ⁱ, a ⁱ) ∈ Δ(S ⁱ), where Δ(S ⁱ) is the probability distribution over the set of states.

Resource Constraints

Given the status of the resource (i.e., the network manager’s state s ⁰), we can write the (linear) resource constraint as

where f ⁱ(s ⁰) is the coefficient under state s ⁰. When a ⁱ is a vector, f ⁱ(s ⁰) is a vector of the appropriate length, and f ⁱ(s ⁰) ⋅ a ⁱ is the inner product of the two vectors.

A variety of multiuser wireless video transmission systems can be modeled as special cases of our general model. Next, we present a packet-level video transmission model as an example .

An Example Packet-Level Video Transmission Model

Packet-level video transmission models have been proposed in a variety of related works, including [6,7,8,9,10,11,12,13,14]. In the following, we briefly describe the model based on [6,7,8,9,10,11,12,13,14] and refer interested readers to [6,7,8,9,10,11,12,13,14] for more details.

We first consider a specific video user i and hence drop the superscript before we describe the resource constraints. The video source data is encoded using an H.264 or MPEG video coder under a group of pictures (GOP) structure: the data is encoded into a series of GOPs, indexed by g = 1, 2, …, where one GOP consists of N data units (DUs). Each DU n ∈{1, …, N} in GOP g, denoted $DU_n^g$, is characterized by its size $l_n^g \in \mathbb {N}_+$ (i.e., the number of packets in it), distortion impact $q_n^g \in \mathbb {R}_+$, delay deadline $d_n^g \in \mathbb {N}_+$, and dependency on the other DUs in the same GOP. The dependency among the DUs in one GOP comes from encoding techniques such as motion estimation/compensation. In general, if $DU_n^g$ depends on $DU_m^g$, we have $d_n^g \geq d_m^g$ and $q_n^g \leq q_m^g$, namely, $DU_m^g$ should be decoded before $DU_n^g$ and has a higher distortion impact than $DU_n^g$ [17]. Note that in the case of scalable video coding, there is no dependency among the DUs, and the following representation of the model can be greatly simplified. We will keep the dependency for generality in our exposition.

Among the above characteristics, the distortion impact $q_n^g$, delay deadline $d_n^g$, and the dependency are deterministic and fixed for the same DUs across different GOPs (e.g., $q_n^g=q_n^{g+1}$) [6, 17]. As in [6], the sizes of all the DUs are independent random variables and that the sizes of the nth DUs in different GOPs have the same distribution.

States

Each user’s state consists of the traffic state T _t and the channel state h _t. We describe the traffic state T _t first. At time slot t, as in [6, 14, 17], we assume that the wireless user will only consider for transmission the DUs with delay deadlines in the range of [t, t + W), where W is referred to as the scheduling time window (STW). Following the model in [6, 17], at time slot t, we introduce context to represent the set of DUs that are considered for transmission, i.e., whose delay deadlines are within the range of [t, t + W). We denote the context by $C_t = \left \{DU^g_j|d^g_j \in [t, t + W)\right \}$. Since the GOP structure is fixed, the transition from context C _t to C _t+1 is deterministic. An illustration of the context is given in Fig. 1.

Given the current context C _t, we let x _t,DU denote the number of packets in the buffer associated with a DU in C _t. We denote the buffer state of the DUs in C _t by $x_t = \left \{x_{t,DU} | DU \in C_t\right \}$. The traffic state T _t at time slot t is then T _t = (C _t, x _t), where the context C _t represents which DUs are available for transmission, and the buffer state x _t represents how many packets each available DU has left in the buffer.

Next we describe the channel state h _t. At each time slot t, the wireless user experiences a channel condition $h_t \in \mathcal {H}$, where $\mathcal {H}$ is the finite set of possible channel conditions. We assume that the wireless channel is slow fading (i.e., remains the same in one time slot) and that the channel condition h _t can be modeled as a finite-state Markov chain [24].

In summary, the state of a user at each time slot t is s _t = (C _t, x _t, h _t), which includes the current context, buffer state, and channel state.

(Packet Scheduling) Actions

At each time slot t, the user decides how many packets should be transmitted from each DU in the current context. The decision is represented by a _t(s _t) = {y _t,DU|DU ∈ C _t, y _t,DU ∈ [0, x _t,DU]}, where y _t,DU is the amount of packets transmitted from the DU.

Payoffs

As in [17], we consider the following instantaneous payoff at each time slot t: (The payoff function can be easily extended within our framework to include additional features in the model. For example, when there are packet losses, we can modify the first term to be the expected distortion reduction given the packet loss rate or modify the second term to consider the additional energy consumption associated with packet retransmission.)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle u(s_t,a_t) = \sum_{DU \in C_t} q_{DU} y_{t,DU} - \beta \cdot \rho\left(h_t, \|a_t\|{}_1\right), \end{array} \end{aligned} $$

(1)

where the first term $\sum _{DU \in C_t} q_{DU} y_{t,DU}$ is the instantaneous video quality, namely, the distortion reduction obtained by transmitting the packets from the DUs in the current context, and the second term $\beta \cdot \rho \left (h_t, \|a_t\|{ }_1\right )$ represents the disutility of the energy consumption by transmitting the packets. Since the packet scheduling action a _t is a vector with nonnegative components, we have $\|a_t\|{ }_1=\sum _{DU \in C_t} y_{t,DU}$, namely, ∥a _t∥₁ is the total number of transmitted packets. As in [17], the energy consumption function ρ(h, ∥a∥₁) is assumed to be convex in the total number of transmitted packets ∥a∥₁ given the channel condition h. An example of such a function can be $\rho (h,\|a\|{ }_1)=\sigma ^2(e^{2\|a\|{ }_1 b}-1)/h$, where b is the number of bits in one packet [25]. The payoff function is a trade-off between the distortion reduction and the energy consumption, where the relative importance of energy consumption compared to distortion reduction is characterized by the trade-off parameter β > 0. In the simulation, we will set different values for β to illustrate the trade-off between the distortion reduction and energy consumption.

The Resource Constraint

Suppose that the users access the channels in a frequency-division multiple access (FDMA) manner. The total bandwidth B is shared by the users. We assume that each user i uses adaptive modulation and coding (AMC) based on its channel condition. In other words, each user i chooses a data rate $r_t^i$ under the channel state $h_t^i$. Note that the rate selection is done by the physical layer and is not a decision variable in our framework. Then as in [8], we have the following resource constraint:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle \sum_{i=1}^I \frac{\|a_{t}^i\|{}_1 b}{r_t^i(h_t^i)}\leq B, \end{array} \end{aligned} $$

(2)

where $\frac {\|a_{t}^i\|{ }_1 \cdot b}{r_t^i(h_t^i)}$ is the bandwidth needed for transmitting the amount $\|a_{t}^i\|{ }_1 \cdot b$ of bits given the data rate $r_t^i(h_t^i)$.

In this model, the network manager’s state s ⁰ is then the collection of channel states, namely, s ⁰ = (h ¹, …, h ^I). The information about the channel states is fed back from the users to the network manager. We can write the constraint compactly as the linear constraint $f(s_t^0, a_t^1, \ldots , a_t^N)\leq 0$ with $f^i(s_t^0) = \frac {b}{r_t^i(h_t^i)}, i=1,\ldots ,I$ and $f^0(s_t^0) = -B$.

The Design Problem

Each user makes decisions based on its state s _t. Hence, each user i’s strategy can be defined as a mapping $\pi ^i : S^i \rightarrow \cup _{s^i} A^i(s^i)$, where A ⁱ(s ⁱ) is the set of actions available under state s ⁱ. We allow the set of available actions to depend on the state, in order to capture the minimum video quality guarantee. For example, at any time, user i has a minimum distortion impact reduction requirement D ⁱ, formulated as

The users aim to maximize their expected long-term payoff. Given its initial state $s_0^i$, each user i’s strategy π ⁱ induces a probability distribution over the sequences of states $s_1^i,s_2^i,\ldots $, and hence a probability distribution over the sequences of instantaneous payoffs $u_0^i, u_1^i,\ldots $. Taking expectation with respect to the sequences of payoffs, we have user i’s long-term payoff given the initial state as

(3)

where δ ∈ [0, 1) is the discount factor.

The design problem can be formulated as

(4)

Note that the design problem (4) is a weakly coupled MU-MDP as defined by [26]. It is a MU-MDP because there are multiple users making foresighted decisions. The MU-MDP is coupled, because the users influence each other through the resource constraints (namely, the choice of one user’s action depends on the choices of the other users). However, it is weakly coupled, because the coupling is through the resource constraints only and because one user’s instantaneous payoff u ⁱ(s ⁱ, a ⁱ) is not affected by the other users’ actions a ^j. It is this weak coupling that enables us to decompose the multiuser problem into multiple single-user problems through prices. Such a decomposition of weakly coupled MU-MDPs has been studied in a general setting [26] and in wireless video transmission [6], both adopting a dual decomposition approach based on uniform price (i.e., the same Lagrangian multiplier for the resource constraints under all the states).

Note also that we sum up the network utility $\sum _{i=1}^I U^i(\pi ^i|s_0^i)$ under all the initial states $(s_0^1,\ldots ,s_0^I)$. This can be interpreted as the expected network utility when the initial state is uniformly distributed. The optimal stationary strategy profile that maximizes this expected network utility will also maximize the network utility given any initial state.

The design problem (4) is very challenging and has never been solved optimally. To better understand this, let us assume that a central controller would exist which knows the complete information of the system (i.e., the states, the state transitions, the payoff functions) at each time step. Then, this central controller can solve the above problem (4) as a centralized single-user MDP (e.g., using well-known value iteration or policy iteration methods) and obtain the solution to the design problem π ^⋆ and the optimal value function U ^⋆. However, the multiuser wireless video system we discussed is inherently informationally decentralized, and there is no entity in the network that possesses the complete information. Moreover, the computational complexity of solving (4) by a single entity is prohibitively high. Hence, our goal is to develop an optimal decentralized algorithm that converges to the optimal solution .

Optimal Foresighted Video Transmission

In this section, we show how to determine the optimal foresighted video transmission policies. We propose an algorithm that allows each entity to make decisions based on its local information and the limited information exchange between the BS and the users. Specifically, in each time slot, the BS sends resource prices to each user, and the users send their total numbers of packets to transmit to the BS. The BS keeps updating the resource prices based on the resource usage by the users and obtains the optimal resource prices based on which the users’ optimal individual decisions achieve the optimal network utility.

Decoupling of The Users’ Decision Problems

Each user aims to maximize its own long-term payoff $U^i(\pi ^i|s_0^i)$ subject to the constraints. Specifically, given the other users’ strategies π ⁻ⁱ = (π ¹, …, π ⁱ⁻¹, π ⁱ⁺¹, …, π ^I) and states s ⁻ⁱ = (s ¹, …, s ⁱ⁻¹, s ⁱ⁺¹, …, s ^I), each user i solves the following long-term payoff maximization problem:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \pi^i = &\displaystyle \displaystyle\arg\max_{\tilde{\pi}^i} &\displaystyle U_i(\tilde{\pi}^i|s_0^i) \\ &\displaystyle s.t. &\displaystyle \tilde{\pi}^i(s^i) \in A^i(s^i), ~\forall s^i, \\ &\displaystyle &\displaystyle f(s^0,\tilde{\pi}^i(s^i),\boldsymbol{\pi}^{-i}(\boldsymbol{s}^{-i})) \leq 0. \end{array} \end{aligned} $$

(5)

Assuming that the user knows all the information (i.e., the other users’ strategies π ⁻ⁱ and states s ⁻ⁱ), user i’s optimal value function should satisfy the following:

(6)

Note that the above equations would be the Bellman equation, if user i knew the other users’ strategies π ⁻ⁱ and states s ⁻ⁱ and the BS’ state s ⁰ (i.e., the channel states of all the users). However, such information is never known to a particular user. Without such information, one user cannot solve the decision problem above because the resource constraint contains unknown variables. Hence, we need to separate the influence of the other users’ decisions from each user’s decision problem.

One way to decouple the interaction among the users is to remove the resource constraint and add it as a penalty to the objective function. Denote the Lagrangian multiplier (i.e., the “price”) associated with the constraint under state s ⁰ as λ ⁰(s ⁰). Then the penalty at state s ⁰ is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle - \lambda^0(s^0) \cdot f(s^0, a^1, \ldots, a^I) = - \lambda^0(s^0) \cdot \sum_{i=1}^I f^i(s^0) \cdot a^i.\vspace{5pt} \end{array} \end{aligned} $$

Since the term − λ ⁰(s ⁰) ⋅∑_{j ≠ i} f ^j(s ⁰) ⋅ a ^j is a constant for user i, we only need to add − λ ⁰(s ⁰) ⋅ f ⁱ(s ⁰) ⋅ a ⁱ to each user i’s objective function. We define $\lambda ^i(s^0) \triangleq \lambda ^0(s^0) \cdot f^i(s^0)$. Then we can rewrite user i’s decision problem as

(7)

By contrasting (7) with (6), we can see that given the price λ ⁱ, each user can make decisions based only on its local information since the resource constraint is eliminated. Note, importantly, that the above decision problem (7) for each user i is different from that in [6] with uniform price. This can be seen from the term λ ⁱ(s ⁰) ⋅ a ⁱ in (7), where the price λ ⁱ(s ⁰) is user specific and depends on the state, while the uniform price in [6] is a constant λ. The decision problem (7) is also different from the subproblem resulting from dual decomposition in NUM, because it is a foresighted optimization problem that aims to maximize the long-term payoff. This requires a different method to calculate the optimal Lagrangian multiplier λ ⁱ(s ⁰) than that in NUM .

Optimal Decentralized Video Transmission Strategy

For the general model described in section “General Model for Video Applications over Cognitive Radio”-A, we propose an algorithm used by the BS to iteratively update the prices and by the users to update their optimal strategies. The algorithm will converge to the optimal prices and the optimal strategy profile that achieves the minimum total system payoff U ^⋆. The algorithm is described in Table 3.

Table 3 Distributed algorithm to compute the optimal strategy at time t

Full size table

Theorem 1

The algorithm in Table 3 converges to the optimal strategy profile, namely,

$$\displaystyle \begin{aligned} \begin{array}{c} \lim_{t,k \rightarrow \infty} \left| \sum_{s_t^1,\ldots,s_t^I} \sum_{i=1}^I U_i(\pi^{i,\lambda^i_{k}}|s_t^i) - U^\star \right| = 0 \end{array}. \end{aligned}$$

Proof

See the appendix in [7].

We illustrate the BS’s and users’ updates and their information exchange in one time slot in Fig. 2. At the beginning of each time slot t, the BS and the users exchange information to compute the optimal resource price and the optimal actions to take. Specifically, in each iteration k, the BS updates the resource price $\lambda _k^0$. Then based on the user-specific resource price $\lambda _k^{i}$, each user i solves for the optimal individual strategy $\pi ^{i,\lambda _k^{i}}$ and sends the BS its resource request $f^i \cdot \pi ^{i,\lambda _k^i}(s_t^i)$. Then the BS updates the prices based on the users’ resource requests using the stochastic subgradient method, which can be performed easily. The difference from the dual decomposition in NUM is that each user’s decision problem in our work is a foresighted optimization problem aiming to maximize the long-term, instead of instantaneous, payoff. Our algorithm is also different from the algorithm in [6] in that we have different prices under different states.

From Fig. 2, we can clearly see what information (namely, resource prices $\lambda _k^0$ and resource requests $f^i \cdot \pi ^{i,\lambda _k^i}(s_t^i)$) is exchanged. The amount of information exchange is small (O(I)), compared to the amount of information required by each user to solve the decision problem (6) directly (∏_{j ≠ i}|S _i| states plus the strategies π _−i). In other words, the algorithm enables the entities to exchange a small amount (O(I)) of information and reach the optimal video transmission strategy that achieves the same performance as when each entity knows the complete information (i.e., the states and the strategies of all the entities) about the system .

Optimal Packet Scheduling

In the previous subsection, we propose an algorithm of optimal foresighted resource allocation and packet scheduling for the general video transmission model described in section “General Model for Video Applications over Cognitive Radio”-A. In the algorithm, each user’s packet scheduling decision is obtained by solving the Bellman equation (7) (see Table 3). The Bellman equation (7) can be solved by a variety of standard techniques such as value iteration. However, the computational complexity of directly applying value iteration may be high, because each user’s state contains the information of all DUs, and thus each user’s state space can be very large. In the following, we show that for the specific model described in section “General Model for Video Applications over Cognitive Radio”-B, we can greatly simplify the packet scheduling decision problem. The key simplification comes from the decomposition of each user’s packet scheduling problem with multiple DUs into multiple packet scheduling problems with single DU. In this way, we can greatly reduce the number of states in each single-DU packet scheduling problem, such that the total complexity of packet scheduling grows linearly, instead of exponentially without decomposition, with the number of DUs.

The decomposition closely follows the decomposition of multiple-DU packet scheduling problems proposed in [17]. The only difference is that the decision problem (7) in our work has an additional term − λ ⁱ(s ⁰) ⋅ a ⁱ due to the price, while such a term does not exist in [17] because the single-user packet scheduling problem is considered in [17].

Lemma 1 (Structural Result)

Suppose DU ₁ ∈ C _t and DU ₂ ∈ C _t . If DU ₂ depends on DU ₁ , we should schedule the packets of DU ₁ before scheduling the packets of DU ₂.

Although Lemma 1 is straightforward, it greatly simplifies the scheduling problem because we can now take advantage of the partial ordering of the DUs. However, this still does not solve the scheduling decision for the DUs that are not dependent on each other. Next, we provide the algorithm of optimal packet scheduling in Table 4. The algorithm decomposes the multiple-DU packet scheduling problem into a sequence of single-DU packet scheduling problems and determines how many packets to transmit for each DU sequentially. This greatly reduces the total computational complexity (which is linear in the number of DUs) compared to solving the multiple-DU packet scheduling problem directly (in which the number of states grows exponentially withe number of DUs). The algorithm is similar to [17, Algorithm 2]. The only difference is the term − λ ⁱ(s ⁰) ⋅ a ⁱ.

Table 4 The optimal packet scheduling algorithm

Full size table

Learning Unknown Dynamics

In practice, each entity may not know the dynamics of its own states (i.e., its own state transition probabilities) or even the set of its own states. When the state dynamics are not known a priori, each entity cannot solve their decision problems using the distributed algorithm in Table 3. In this case, we can adapt the online learning algorithm based on post-decision state (PDS) in [17], which was originally proposed for single-user wireless video transmission, to the considered deployment scenario.

The main idea of the PDS-based online learning is to learn the post-decision value function, instead of the value function. Each user i’s post-decision value function is defined as $\tilde {U}^i(\tilde {x}^i,\tilde {h}^i)$, where $(\tilde {x}^i,\tilde {h}^i)$ is the post-decision state. The difference from the normal state is that the PDS $(\tilde {x}^i,\tilde {h}^i)$ describes the status of the system after the scheduling action is made but before the DUs in the next period arrive. Hence, the relationship between the PDS and the normal state is

$$\displaystyle \begin{aligned} \tilde{x}^i = x^i - a^i,~\tilde{h}^i=h^i. \end{aligned}$$

Then the post-decision value function can be expressed in terms of the value function as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \tilde{U}^i(\tilde{x}^i,\tilde{h}^i) = \sum_{x^{i\prime},h^{i \prime}} p^i(x^{i \prime},h^{i \prime}|\tilde{x}^i + a^i,\tilde{h}^i)\cdot \tilde{V}^i(x^{i \prime}, \tilde{h}^i). \end{array} \end{aligned} $$

In PDS-based online learning, the normal value function and the post-decision value function are updated in the following way:

$$\displaystyle \begin{aligned} \begin{array}{rcl} V^{i}_{k+1}(x^{i}_k, h^{i}_k) &\displaystyle =&\displaystyle \max_{a^i}~(1-\delta) \cdot u^i(x^{i}_k,h^{i}_k,a^i) \\ &\displaystyle &\displaystyle +~\delta \cdot U^{i}_k(x^{i}_k+(a^i-l^{i}_k),h^{i}_k), \\ U^{i}_{k+1}(x^{i}_k, h^{i}_k) &\displaystyle =&\displaystyle (1-\frac{1}{k}) U^{i}_k(x^{i}_k, h^{i}_k) \\ &\displaystyle &\displaystyle +~\frac{1}{k} \cdot V^{i}_k(x^{i}_k-(a^i-l^{i}_k), h^{i}_k). \end{array} \end{aligned} $$

We can see that the above updates do not require any knowledge about the state dynamics. In particular, we propose the decomposed optimal packet scheduling with PDS-based learning in Table 5. Note that the difference between the learning algorithm in Table 5 with the algorithm assuming statistic knowledge in Table 4 is that we use the post-decision state value function instead of the normal value function. It is proved in [17] that the PDS-based online learning will converge to the optimal value function. Hence, the distributed packet scheduling and resource allocation solution in Table 3 can be modified by letting each user perform the packet scheduling using the PDS-based learning in Table 5.

Table 5 The optimal decomposed packet scheduling algorithm with PDS-based learning

Full size table

Strategy-Proof Resource Allocation Mechanisms

When the users are self-interested, they may not follow the solutions introduced in section “Optimal Foresighted Video Transmission”. In particular, they may not be truthful in the message exchange with the network manager. There are several ways of designing strategy-proof resource allocation mechanisms, based on pricing [3] and mechanism design [1,2,3,4,5]. In this section, we describe a representative framework based on auctions [4, 5].

The auction-based resource allocation mechanism is illustrated in Fig. 3. The basic procedure at each time slot is described as follows:

1.
The network manager announces the total amount of available resources (e.g., state s ⁰).
2.
The SUs submit bids of how much resources they are willing to get.
3.
Based on SUs’ bids, the network manager determines the resource allocation and the payments required from SUs.
4.
Based on the allocated resources, the SUs schedule their video packets.

As we can see, most elements (e.g., states, rewards) in auction-based mechanisms are the same as in the general model in section “General Model for Video Applications over Cognitive Radio”. Here we list some key features of the auction-based mechanism.

In the auction-based mechanism, the SUs’ actions consist of two types of actions, internal actions and external actions. The internal actions, denoted by $b_i^t$ in Fig. 3, are the packet scheduling actions described in section “General Model for Video Applications over Cognitive Radio”. The external actions are unique in the auction-based mechanism. Specifically, the external action is the amount of resources each SU wants to get (i.e., their bids).
In the auction-based mechanism, the network manager directly allocates the resources to the SUs and announces the payments required from the SUs. This is different from the mechanism in section “General Model for Video Applications over Cognitive Radio”, where the network manager announces the prices and the SUs determine the resource allocation based on the prices.

We refer interested readers to [4, 5] for detailed descriptions and theoretical results of the auction-based mechanisms.

Conclusion and Future Directions

In this chapter, we introduce the optimal foresighted resource allocation and packet scheduling for multiuser wireless video transmission over cognitive radio networks. The introduced solution achieves the optimal long-term video quality subject to each user’s minimum video quality guarantee, by dynamically allocating resources among the users and dynamically scheduling the users’ packets while taking into account the dynamics of the video traffic and channel states. We develop a low-complexity algorithm that can be implemented by the network manager and the users in an informationally decentralized manner and converges to the optimal solution. We also introduce strategy-proof variations of our solutions for self-interested users.

There are many important future research directions. First, we have focused on the cases where users have orthogonal spectrum access. One interesting direction is to allow non-orthogonal spectrum access through power control. This will further complicate the coupling among the users. This is because orthogonal spectrum access results in resource allocation constraints as linear inequality, while non-orthogonal spectrum access will destroy the linearity. Theoretically, the resulting problems are strongly coupled multiuser MDPs, instead of weakly coupled multiuser MDPs in this chapter.

Second, in the scenarios where users are self-interested, theoretical analysis of the efficiency at the equilibrium is yet missing. There are existing works in computer science and economics literature that analyze the Price of Anarchy (PoA) and efficiency of learning in games . However, most of these works focus on one-shot games (i.e., when users are myopic). When the users are foresighted, the problem is more challenging and calls for novel analytical frameworks.

References

Fattahi A, Fu F, van der Schaar M, Paganini F (2007) Mechanism-based resource allocation for multimedia transmission over spectrum agile wireless networks. IEEE J Sel Areas Commun 25(3):601–612
Article Google Scholar
Fu F, van der Schaar M (2007) Noncollaborative resource management for wireless multimedia applications using mechanism design. IEEE Trans Multimedia 9(4):851–868
Article Google Scholar
Fu F, Stoenescu TM, van der Schaar M (2007) A pricing mechanism for resource allocation in wireless multimedia applications. IEEE J Sel Top Sign Process, Spec Issue Netw-Aware Multimedia Process Commun 1(2):264–279
Article Google Scholar
Fu F, van der Schaar M (2009) Learning to compete for resources in wireless stochastic games. IEEE Trans Veh Tech 58(4):1904–1919
Article Google Scholar
van der Schaar M, Fu F (2009) Spectrum access games and strategic learning in cognitive radio networks for delay-critical applications. Proc IEEE Spec Issue Cogn Radio 97(4):720–740
Google Scholar
Fu F, van der Schaar M (2010) A systematic framework for dynamically optimizing multi-user video transmission. IEEE J Sel Areas Commun 28(3):308–320
Article Google Scholar
Xiao Y, van der Schaar M (2015) Optimal foresighted multi-user wireless video. IEEE J Sel Top Sign Process, Spec Issue Visual Sign Process Wirel Netw 9(1):89–101
Article Google Scholar
Yu Y-J, Hsiu P-C, Pang A-C (2012) Energy-efficient video multicast in 4G wireless systems. IEEE Trans Mob Comput 11(10):1508–1522
Article Google Scholar
van der Schaar M, Andreopoulos Y, Hu Z (2006) Optimized scalable video streaming over IEEE 802.11 a/e HCCA wireless networks under delay constraints. IEEE Trans Mob Comput 5(6):755–768
Article Google Scholar
Su G-M, Han Z, Wu M, Liu KJR (2007) Joint uplink and downlink optimization for real-time multiuser video streaming over WLANs. IEEE J Sel Top Sign Process 1(2):280–294
Article Google Scholar
Zhang X, Du Q (2007) Cross-layer modeling for QoS-driven multimedia multicast/broadcast over fading channels in mobile wireless networks. IEEE Commun Mag 45(8):62–70
Article Google Scholar
Huang J, Li Z, Chiang M, Katsaggelos AK (2008) Joint source adaptation and resource allocation for multi-user wireless video streaming. IEEE Trans Circuits Syst Video Technol 18(5):582–595
Article Google Scholar
Maani E, Pahalawatta P, Berry R, Pappas TN, Katsaggelos AK (2008) Resource allocation for downlink multiuser video transmission over wireless lossy networks. IEEE Trans Image Process 17(9):1663–1671
Article MathSciNet Google Scholar
Chou P, Miao Z (2006) Rate-distortion optimized streaming of packetized media. IEEE Trans Multimedia 8(2):390–404
Article Google Scholar
Huang J, Wang H, Qian Y (2017) Game user-oriented multimedia transmission over cognitive radio networks. IEEE Trans Circuits Syst Video Tech 27(1):198–208
Article Google Scholar
Ji X, Huang J, Chiang M, Lafruit G, Catthoor F (2009) Scheduling and resource allocation for SVC streaming over OFDM downlink systems. IEEE Trans Circuits Syst Video Technol 19(10):1549–1555
Article Google Scholar
Fu F, van der Schaar M (2012) Structural solutions for dynamic scheduling in wireless multimedia transmission. IEEE Trans Circuits Syst Video Technol 22(5):727–739
Article Google Scholar
Wu Y, Hu F, Kumar S, Zhu Y, Talari A, Rahnavard N, Matyjas J (2014) A learning-based QoE-driven spectrum handoff scheme for multimedia transmissions over cognitive radio networks. IEEE J Sel Areas Comm 32(11):2134–2148
Article Google Scholar
Shah G, Alagoz F, Fadel E, Akan O (2014) A spectrum-aware clustering for efficient multimedia routing in cognitive radio sensor networks. IEEE Trans Veh Tech 63(7):3369–3380
Article Google Scholar
He Z, Mao S, Kompella S (2016) Quality of experience driven multi-user video streaming in cellular cognitive radio networks with single channel access. IEEE Trans Multimedia 18(7):1401–1413
Article Google Scholar
Wang H, Qian Y, Sharif H (2013) Multimedia communications over cognitive radio networks for smart grid applications. IEEE Wirel Commun 20(4):125–132
Article Google Scholar
Chen W, Neely MJ, Mitra U (2008) Energy-efficient transmission with individual packet delay constraints. IEEE Trans Inform Theory 54(5):2090–2109
Article MathSciNet Google Scholar
Fu F, van der Schaar M (2012) Structure-aware stochasticcontrol for transmission scheduling. IEEE Trans Veh Tech 61(9):3931–3945
Article Google Scholar
Zhang Q, Kassam SA (1999) Finite-state Markov model for Rayleigh fading channels. IEEE Trans Commun 47(11):1688–1692
Article Google Scholar
Bertsekas D, Gallager R (1987) Data networks. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Hawkins J (2003) A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. PhD dissertation, MIT, Cambridge
Google Scholar

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Yuanzhang Xiao
Oxford – Man Institute of Quantitative Finance (OMI), Oxford, UK
Mihaela van der Schaar
Department of Engineering Science, University of Oxford, Oxford, UK
Mihaela van der Schaar

Authors

Yuanzhang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela van der Schaar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanzhang Xiao .

Editor information

Editors and Affiliations

School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW, Australia
Wei Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Xiao, Y., der Schaar, M.v. (2019). Cognitive Radio Networks for Delay-Sensitive Applications: Games and Learning. In: Zhang, W. (eds) Handbook of Cognitive Radio . Springer, Singapore. https://doi.org/10.1007/978-981-10-1394-2_28

Download citation

DOI: https://doi.org/10.1007/978-981-10-1394-2_28
Published: 22 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1393-5
Online ISBN: 978-981-10-1394-2
eBook Packages: EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Cognitive Radio Networks for Delay-Sensitive Applications: Games and Learning

Abstract

Similar content being viewed by others

Cognitive Radio Networks for Delay-Sensitive Applications: Games and Learning

A survey of dynamic spectrum allocation based on reinforcement learning algorithms in cognitive radio networks

Cognitive Network Management Framework and Approach for Video Streaming Optimization in Heterogeneous Networks

Keywords

Introduction

Related Work

Multimedia Applications over Cognitive Radio Networks

Theoretical Frameworks

General Model for Video Applications over Cognitive Radio

The General Model

States

Actions

Payoffs

State Transition

Resource Constraints

An Example Packet-Level Video Transmission Model

States

(Packet Scheduling) Actions

Payoffs

The Resource Constraint

The Design Problem

Optimal Foresighted Video Transmission

Decoupling of The Users’ Decision Problems

Optimal Decentralized Video Transmission Strategy

Theorem 1

Proof

Optimal Packet Scheduling

Lemma 1 (Structural Result)

Learning Unknown Dynamics

Strategy-Proof Resource Allocation Mechanisms

Conclusion and Future Directions

References

Further Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation