1 Introduction

Recent widespread acceptance of wireless applications has triggered a huge demand for radio spectrum. For many years radio spectrum has been assigned to licensed (primary) users. Most of the time, some frequency bands in the radio spectrum remain largely unoccupied by primary users. Spectrum usage measurements by the Federal Communications Commission (FCC) show that at any given time and location, most of the spectrum is actually idle. That is, the spectrum shortage results from the spectrum management policy instead of the actual physical scarcity of usable spectrum. Cognitive radio (CR), which has been introduced in [1], is considered as an enabling technology that allows unlicensed (secondary) users to operate in the licensed spectrum bands. This can help overcome the lack of available spectrum in wireless communications. CR is capable of sensing its surrounding environment and adapting its internal states by making corresponding changes in certain operating parameters [2]. The FCC in the United States began to consider more flexible use of available spectrum. The NeXt Generation program of the Defense Advanced Research Project Agency also aims to redistribute allocated spectrum dynamically.

One important application of CR is spectrum overlay dynamic spectrum access (DSA), where secondary users operate in the licensed band while limiting interference with primary users. Spectrum opportunities are detected and used by secondary users in the time and frequency domain [3]. An optimal spectrum sensing strategy is proposed in [4] to maximize throughput. A separation principle is established in [5] to decouple the design of the sensing strategy from that of the spectrum sensor and the access strategy. The benefits of cooperation in CR are illustrated in [6, 7] for two- and multi-user networks, respectively. A dynamic frequency hopping scheme is presented in [8] for IEEE 802.22 wireless regional area networks, which is an emerging standard based on CR technologies. In [9], the authors present a game theoretical dynamic spectrum sharing framework for analysis of network users’ behaviors, efficient dynamic distributed design, and optimality analysis. Other game theoretic DSA methods are presented in [10, 11]. The authors in [12] exploit channel availability in the time domain and demonstrate the throughput performance for a Bluetooth/WLAN system. Spectrum opportunity is also exploited in the time domain in [13] where the authors present an ad hoc secondary MAC protocol to facilitate DSA.

Although much work has been done in CR networks, most previous work considers maximizing the throughput of secondary users as one of the most important design criteria. As a consequence, other QoS measures for secondary users, such as distortion for multimedia applications, are mostly ignored in the literature. However, recent work in cross-layer design shows that maximizing throughput does not necessarily benefit QoS at the application layer for some multimedia applications, such as video [14, 15]. From a user’s point of view, QoS at the application layer is more important than that at other layers. Moreover, CR-based services for secondary users would have a strictly lower QoS than radio services that enjoy guaranteed spectrum access [16]. Therefore, if the application layer QoS is not carefully considered in CR networks, the perceived reduction in QoS associated with CR may impede the success of CR technologies.

Multimedia applications such as video telephony, conferencing, and video surveillance are being targeted for wireless networks, including CR networks. Lossy video compression standards, such as MPEG-4 and H.264, exploit the spatial and temporal redundancy in video streams to reduce the required bandwidth to transmit video. Compressed video comprises of intra- and inter-coded frames. The intra refreshing rate is an important application layer parameter [17]. Adaptively adjusting the intra refreshing rate for online video encoding applications can improve error resilience to the time varying wireless channels available to secondary users in CR networks.

Cross-layer wireless multimedia transmission, where parameter optimization is considered jointly across OSI layers, has been well studied in the literature [18]. Recent work shows promising improvement to video QoS by considering resource management, adaptation, and protection strategies available at the physical, medium access control, and network/transport layers in conjunction with multimedia compression and streaming algorithms [19]. Various channel adaptive distortion driven cross-layer transmission strategies have been explored. The authors in [20] investigate a classification based system where the optimal cross-layer strategy for various video and channel conditions are computed offline thereby reducing the transmission-time complexity of the compression and transmission strategy. Within a rate-distortion framework, source coding, retransmission, and adaptive modulation parameters are jointly considered for video summary in [21]. The authors in [22] take a cross-layer approach to allocate power level, source coding rate, and channel coding rate delivering basic and enhanced QoS levels for distant and near receivers in a CDMA network.

Although there are some cross-layer design techniques for wireless multimedia transmission in the literature, little work investigates channel adaptive multimedia transmission over a cognitive radio network. In this paper, we take an integrated design approach to jointly optimize application layer QoS for multimedia transmission over cognitive radio networks. Based on the sensed channel condition, secondary users can adapt the intra refreshing rate at the application layer, in addition to the parameters at other layers. Some distinct features of the proposed scheme are as follows.

  • For secondary users in CR networks, channel selection for spectrum sensing, access decision, and intra refreshing rate are determined concurrently to maximize the QoS at the application layer (i.e., minimize distortion for video applications).

  • Physical layer channel state information (CSI) (channel gain) is used by secondary users to help make the optimal decision to maximize the application layer QoS.

  • Primary network usage and channel gain are modeled as a finite state Markov process. With channel sensing and CSI errors, the state cannot be directly observed. Following the work in [5], we formulate the whole system as a partially observable Markov decision process (POMDP) [23]. We extend the scheme to jointly optimizing application layer QoS for multimedia transmission over cognitive radio networks.

  • Using simulation examples, we show that application layer parameters have significant impact on the QoS perceived by secondary users in CR networks. We also show that application layer QoS can be improved significantly if the intra refreshing rate is adapted together with parameters at low layers, such as spectrum sensing. This study reveals a number of interesting observations and provides insights into the design and optimization of CR networks from a cross-layer perspective.

The rest of the paper is organized as follows. Section 2 describes the multimedia transmission over CR networks problem. Section 3 presents the proposed scheme. Some simulation results are given in Sect. 4. Finally, we conclude this study in Sect. 5.

2 Multimedia transmission over cognitive radio networks

In this section, we describe the multimedia rate-distortion model used in this paper. We then present the system model for multimedia transmission over cognitive radio networks.

2.1 Rate-distortion (R-D) model for multimedia applications

Wireless channels have limited bandwidth and are error-prone. Highly efficient coding algorithms such as H.264 and MPEG-4 can compress video to reduce the required bandwidth for the video stream. Rate control is used in video coding to control the video encoder output bit rate based on various conditions to improve video quality [24]. For example, the main tasks of MPEG-4 object-based video coding are (1) to determine how many bits are assigned to each video object in the scene and (2) to adjust the quantization parameter to accurately achieve the target coding bit rate [25].

Highly compressed video data is vulnerable to packet losses where a single bit error may cause severe distortion [26]. This vulnerability makes error resilience at the video encoder essential. Intra update, also called intra refreshing, of macroblocks (MBs) is one approach for video error resilience and protection [27]. An intra coded MB does not need information from previous frames which may have already been corrupted by channel errors. This makes intra coded MBs an effective way to mitigate error propagation. Alternatively, with inter-coded MBs, channel errors from previous frames may still propagate to the current frame along the motion compensation path [28].

Given a source-coding bit rate R s and intra refreshing rate, we need a model to estimate the corresponding source distortion D s . The authors in [17] provide a closed form distortion model taking into account varying characteristics of the input video, the sophisticated data representation scheme of the coding algorithm, and the intra refreshing rate. Based on the statistical analysis of the error propagation, error concealment, and channel decoding, a theoretical framework is developed to estimate the channel distortion, D c . Coupled with the R-D model for source coding and time varying wireless channels an adaptive mode selection is proposed for wireless video coding and transmission.

We will use the rate-distortion model described in [17] in our study. The R-D model facilitates adaptive intra-mode selection and joint source-channel rate control

The total end-to-end distortion comprises of D s , the quantization distortion introduced by the lossy video encoder to meet a target bit rate, and D c , the distortion resulting from channel errors. For DCT-based video coding, intra coding of a MB or a frame usually requires more bits than inter coding since inter coding removes the temporal redundancy between two neighboring frames. Let β be the intra refreshing rate, the percentage of MBs coded with intra mode. Inter coding of MBs has much better R-D performance than intra mode. Decreasing the intra refreshing rate decreases the source distortion for a target bit rate. However inter coding relies on information in previous frames. Packet losses due to channel errors result in error propagation along the motion-compensation path until the next intra coded MB is received. Increasing the intra refreshing rate decreases the channel distortion. Thus we have a tradeoff between source and channel distortion when selecting the intra refreshing rate. We aim to find the optimal β to minimize the total end-to-end distortion given the channel bandwidth and packet loss ratio.

The source distortion is given by

$$ D_s(R_s,\beta) = D_s(R_s,0) + \beta(1-\eta + \eta \beta)[D_s(R_s,1) - D_s(R_s,0)], $$
(1)

where R s denotes the source coding rate, β is the intra refreshing rate, and η is a constant based on the video sequence. D s (R s ,0) and D s (R s ,1) denotes the time average all inter- and intra-mode selection for all frames over K time slots.

$$ D_s(R_s,0) = \frac{1}{K} \sum_{k=1}^{K} \frac{1}{M_k} \sum_{m=1}^{M_k} D_s(Rs,0,m), $$
(2)
$$ D_s(R_s,1) = \frac{1}{K} \sum_{k=1}^{K} \frac{1}{M_k} \sum_{m=1}^{M_k} D_s(Rs,1,m), $$
(3)

where M k is the number of inter/intra frames in time slot k. The average channel distortion for each time slot is given by

$$ D_c(p,\beta) = \left( \frac{a}{1-b+b \beta} \right) \left( \frac{p}{1-p} \right) E[F_d(m,\,m-1)], $$
(4)

where p is the packet loss rate, b is a constant describing motion randomness of the video scene, a is the energy loss ratio of the encoder filter, and E[F d (mm − 1)] is the average value of the frame difference F d (mm − 1) over K slots. We will use the same error concealment strategy and packet loss ratio derivation as described in [17].

The total average distortion is given by

$$ D(R_s, p, \beta) = D_s(R_s,\beta) + D_c(p,\beta). $$
(5)

The optimum β* is then selected to minimize the total distortion.

$$ \beta^* = \hbox{arg} \min_\beta D(R_s,p, \beta). $$
(6)

2.2 System model

Consider a spectrum that consists of N channels, each with bandwidth W(n), 1 ≤ n ≤ N. These N channels are licensed to primary users. Time is divided into slots of equal length T. Slot k refers to the discrete time period [kT, (k + 1)T].

When the slot is not in use by primary users, it will be comprised of AWGN noise and fading. The fading process and primary usage for a channel can be represented by a stationary and ergodic S-state Markov chain. Let i and γ denote the instantaneous channel state and fading gain, respectively. When the channel is in state i, the quantized fading gain is γ i , where γ i  ≤ γ ≤ γi+1, 1 ≤ i ≤ S − 1. When the channel is in state i = S, the channel is in use by the primary network. We assume that the phase of the channel attenuation can be perfectly estimated and removed at the receiver. The S-state Markov channel model is completely described by its stationary distribution of each channel state i, denoted by p(i), and the probability of transitioning from state i into state j after each time slot, denoted by {P ij}, 1 ≤ ij ≤ S.

In general, a finite state Markov channel (FSMC) model is constructed for a particular fading distribution by first partitioning the range of the fading gain into a finite number of sections. Then each section of the gain value corresponds to a state in the Markov chain. The application of FSMC to model Rayleigh channels has been well studied in [29, 30]. Given knowledge of the fading process and primary network usage, the stationary distribution p(i) as well as channel state transition probabilities {P ij} can be derived. Once a channel gain has been determined for states 1,2, S − 1, the packet loss ratio is determined for each state based on the modulation and channel coding schemes. The intra refreshing rate that minimizes the total distortion for each state can then be calculated using the Rate-distortion model.

At the beginning of a slot, the transmitter of secondary users will select a set of channels to sense. Based on the sensing outcome, the transmitter will decide whether or not to access a channel. If the transmitter decides to access a channel, some application layer parameters will be selected and the video content will be transmitted. At the end of the slot, the receiver will acknowledge the transfer by sending the perceived channel gain back to the transmitter. We will assume a system for real-time multimedia applications where packets are discarded if a primary user is using the slot or if the channel is not accessed. The system block diagram showing video transmission between two secondary users is shown in Fig. 1.

Fig. 1
figure 1

The Block diagram of multimedia transmission over cognitive radio networks

3 Solving the multimedia transmission over cognitive radio networks problem

In CR networks with multimedia applications, we need to determine the optimal policy for channel sensing selection, sensor operating point, access decision, and intra refreshing rate to minimize application layer distortion subject to the system probability of collision. With channel sensing and CSI errors, the system state cannot be directly observed. Following the work in [5], we formulate the whole system as a partially observable Markov decision process (POMDP). Deriving a single POMDP formulation for all policies under the probability of collision constraint would result in a constrained POMDP. However, constrained POMDPs require randomized policies to achieve optimality, which is often intractable. Therefore, we use the separation principle in [5] for the sensor operating point and the access decision. The spectrum sensor operating point is set such that δ = ζ, where δ is the probability of miss detection of the busy channel used by primary users and ζ is the required probability of collision.

At the beginning of the slot, the system transitions to a new state. Using a POMDP derived policy, a channel is selected for spectrum sensing. An access decision is then made based on the sensing observation. Using the belief of the channel state, an intra refreshing rate is selected. The receiver acknowledges the transfer by sending the quantized perceived channel gain back to the secondary transmitter. The immediate cost for the time slot is derived based on the previous operations in the slot.

The system can be formulated as a POMDP with states, actions, transition probabilities, observations, and cost structures as follows.

3.1 State space, transition probabilities and observation space

The system state is given by the network usage of primary users and channel state information. Let {X(n)} denote an S-state Markov chain for channel n, \(X(n) \in {\mathbb{X}} = \{e_1, e_2, \ldots, e_{S-1},e_S\}\), where e i denotes the S-dimensional unit vector with 1 in the ith position and zeros elsewhere. The system with N channels is modeled as a discrete-time homogeneous Markov process with S N states. The system state in time slot k is given by V k  = [X k (1), ..., X k (N)].

To simplify the presentation, we consider a system with a single channel in the formulation. It is straightforward to extend the formulation to include multiple channels which is considered in our simulations. For the system with a single channel, V k  = X k . The transition probabilities of the system state are given by the S × S matrix A. We assume the transition probabilities are known based on network usage and channel fading characteristics.

The observation available to the secondary transmitter and receiver is the sensed channel and channel gain acknowledgment, \(Y_k \in {\mathbb{Y}}\), where \({\mathbb{Y}} = \{{\gamma}_1, \ldots, {\gamma}_{S-1}, {\gamma}_S \hbox{ (The channel is used by primary users)}\}\) and γ i  < γ j , ∀i < j.

The spectrum sensor observation may be different at the transmitter and receiver. If the transmitter and receiver use the same observations to derive the information state (described in the following subsection), then the information state can be used to maintain frequency hopping synchronization. Thus the information state will be updated with Y k and will not include the spectrum sensor observation.

Let \(B(y, x, a) = \hbox{Pr}\{y|x, a\}\) denote the conditional probability of observing y given that the system state is in state x and composite action a was taken.

$$ B(y,x,a) = \left\{ \begin{array}{ll} P_{ce}(x, v(y)) (1-\epsilon), & \hbox{if } y \neq {\gamma}_S, x \neq e_S,\\ \epsilon, & \hbox{if } y = {\gamma}_S, x \neq e_S\\ 0, & \hbox{if } y \neq {\gamma}_S, x = e_S\\ 1, & \hbox{if } y = {\gamma}_S, x = e_S, \end{array}\right. $$
(7)

where \(\epsilon\) is the probability of miss detection of the idle channel and v(y) = i, 1 < i < S given y = γ i . When the channel is available and accessed, the probability of channel estimate by the receiver is given by P ce (xv(y)).

Using the work from Hoang and Motani [31], we assume the channel estimation error has a Gaussian distribution with zero mean and σ2 variance. At a particular time and channel, the estimated channel gain is

$$ \hat{\gamma} = \gamma_i + w, $$
(8)

where γ i is the actual channel gain and w is a Gaussian random variable with zero mean and σ2 variance. The receiver then quantizes the channel gain to the nearest possible value. The probability that \(\hat{\gamma}\) is closest to γ j is given by

$$ P_{ce}(i,\,j) = \left\{ \begin{array}{ll} \frac{1}{2} \left[ {erf} \left(\frac{\gamma_j + \gamma_{j+1} - 2\gamma_i}{2\sqrt{2}\sigma} \right) - {erf} \left( \frac{\gamma_j + \gamma_{j-1} - 2\gamma_i}{2\sqrt{2}\sigma}\right) \right], & \hbox{if } j \neq e_1, e_{S-1}, e_S\\ \frac{1}{2} \left[1 + {erf} \left( \frac{\gamma_1 + \gamma_2 - 2\gamma_i}{2\sqrt{2}\sigma}\right) \right], & \hbox{if } j = e_1 \\ \frac{1}{2} \left[1 - {erf} \left( \frac{\gamma_{S-2} + \gamma_{S-1} - 2\gamma_i}{2\sqrt{2}\sigma}\right)\right], & \hbox{if } j = e_{S-1} \\ 0, & \hbox{if } j = e_S \end{array}, \right. $$
(9)

where erf() denotes the error function.

3.2 Information state

Information state is an important concept in POMDP. We will refer to a probability distribution over states as the information state and the entire probability space (the set of all possible probability distributions) as the information space. The information spaces for 2-state and 3-state systems are shown in Fig. 2. For a system with two states, its information space is a one-dimension line. The distance from the right end is the first component π(1) and the distance from the left end is the second component π(2). For the system with 3 states, its information space is a two-dimension triangle. The value of a point in the information space can be obtained from the perpendicular distance from the sides of the triangle. An information state is a sufficient statistic for the decision and observation history.

Fig. 2
figure 2

Information state in POMDP

3.3 Action space

Due to hardware limitations, we will assume that a secondary user is equipped with a single Neyman–Pearson energy detector and can only sense L = 1 channel at each time instant. In each slot k, the secondary user needs to decide whether or not to sense, determine which sensor operating point on the Receiver Operating Curve (ROC) curve to use, whether to access the channel, and which quantized intra refreshing rate to use. Thus the action space consists of four parts: a channel selection decision a s (k)  ∈ {0 (no sense), 1 (sense)}, a spectrum sensor design \((\epsilon(k), \delta(k)) \in {\mathbb{A}}_{\epsilon \delta}\) where \({\mathbb{A}}_{\epsilon \delta}\) are valid points on the ROC curve, an access decision a a (k)  ∈ {0(no access), 1 (access)}, and an intra refreshing rate \(\beta(k) \in {\mathbb{A}}_{\beta}\). The composite action in slot k is denoted by \(a_k = \{a_s(k), (\epsilon(k), \delta(k)), a_a(k), \beta(k) \} \in (\{0,1\}, {\mathbb{A}}_{\epsilon \delta}, \{0,1\}, {\mathbb{A}}_\beta)\).

Due to sensing and channel estimation errors, a secondary user cannot directly observe the true system state. It can infer the system state from its decision and observation history encapsulated by the information state. Information state \(\pi_k = \{\lambda_x(k)\}_{x \in {\mathbb{X}}} \in \Pi({\mathbb{X}})\) where λ x (k)  ∈ [0, 1] denotes the conditional probability (given decision and observation history) that the system state is in \(x \in {\mathbb{X}}\) at the beginning of slot k prior to state transition. \(\Pi({\mathbb{X}}) = \{\lambda_x(k) \in [0,1], \sum_{x \in {\mathbb{X}}} \lambda_x = 1\}\) denotes the information space that includes all possible probability mass functions on the state space \({\mathbb{X}}\).

At the end of the time slot, the transmitter receives observation Y k . The information state is then updated using Bayes’ rule before state transition

$$ \lambda_{k+1} = \frac{\sum_{x' \in {\mathbb{X}}} \lambda_{x'}(k) A_{x',x} B(y_k, x_k, a_k)}{\sum_{x \in {\mathbb{X}}} \sum_{x' \in {\mathbb{X}}} \lambda_{x'}(k) A_{x',x} B(y_k, x_k, a_k)}. $$
(10)

Given information vector π k the distribution of the system state X k in slot k after state transition is then given by

$$ \hbox{Pr}\{X_k = x\} = \sum_{x' \in {\mathbb{X}}} \lambda_{x'}(k)A_{x',x} \; \forall x \in {\mathbb{X}}. $$
(11)

3.4 Cost and policy

From a user’s point of view, QoS at application layer is more important than at other layers. Therefore, we model multimedia distortion as the immediate cost in our scheme. The immediate cost in time slot k is defined as

$$ C_k = D(R, p(x_k, a_k), \beta(k)), $$
(12)

where R is the target bit rate and p(x k a k ) denotes the packet loss ratio when the system is in state x k and composite action a k is taken in time slot k. We assume a a (k) = 0 (no access) is the equivalent to 100% packet loss.

The expected total cost of the POMDP represents the overall distortion for a video sequence transmitted over K slots and can be expressed as

$$ J_\mu = {\mathbb{E}}_{\{\mu_s, \mu_{\epsilon \delta}, \mu_a, \mu_{\beta}\}} \left[\sum_{k=1}^{K} D(R, p(x_k, a_k), \beta(k))\right], $$
(13)

where \({\mathbb{E}}_{\{\mu_s, \mu_{\epsilon \delta}, \mu_a, \mu_{\beta}\}}\) indicates the expectation given that policies μ s \(\mu_{\epsilon\delta}\),  μ a , μβ are employed.

A channel sensing policy μ s specifies a channel to sense, a s . A sensor operating policy \(\mu_{\epsilon\delta}\) specifies a spectrum sensor design \((\epsilon, \delta) \in {\mathbb{A}}_{\epsilon \delta}\) based on the system tolerable probability of collision, ζ. An access policy μ a specifies the access decision a a  ∈ {0,1}. An intra refreshing policy μβ specifies the intra refreshing decision \({\beta} \in {\mathbb{A}}_{\beta}\) based on the current information state π k .

3.5 Objective and constraint

We aim to develop the joint design of an optimal policy for multimedia transmission over CR networks, \(\{\mu_s^*, \mu_{\epsilon \delta}^*, \mu_a^*, \mu_{\beta}^* \}\), that minimizes the expected total distortion in K slots under the collision constraint P c .

$$ \{\mu_s^*, \mu_{\epsilon \delta}^*, \mu_a^*, \mu_{\beta}^*\} = \hbox{arg} \min_{\mu_s, \mu_{\epsilon \delta}, \mu_a, \mu_{\beta}} {\mathbb{E}}_{\{\mu_s, \mu_{\epsilon \delta}, \mu_a, \mu_{\beta}\}} \left[\sum_{k=1}^{K} D(R, p(x_k, a_k), \beta(k))\right] $$
(14)
$$ \begin{aligned} & \hbox{subject to} \\ & \quad P_c(k) = \hbox{Pr}\{a_a(k) = 1|X_k = e_S\} < \zeta, \quad \forall k \in K. \end{aligned} $$
(15)

3.6 Value function

Let J k (π) be the value function that represents the minimum expected cost that can be obtained starting from slot k (1 ≤ k ≤ K) given information state π k at the beginning of slot k. Given that the secondary user takes action a k and observes acknowledgment Y k  = y k , the cost that can be accumulated starting from slot k consists of the immediate cost C k  = D(Rp(x k a k ), β(k)) and the minimum expected future cost J k+1(π + 1). \(\pi_{k+1} =\{ \lambda_x(k+1) \}_{x \in {\mathbb{X}}} =U({\pi_k}| a_k, y_k)\), which represents the updated knowledge of system state after incorporating the action a k and the acknowledgment y k in slot k. The sensing policy is then given by

$$ \begin{aligned} J_k(\pi_k) &= \min_{a \in {\mathbb{A}}} \sum_{x \in {\mathbb{X}}} \sum_{x' \in {\mathbb{X}}} \lambda_{x'}(k) A_{x',x} \sum_{j=e_1}^{e_S} B(y_k, j, a_k) [D(R, p(x_k, a_k), \beta(k)) \\ &\quad + J_{k+1}(U(\pi_k|a_k, y_k))] , \quad 1 \le k \le K-1 \end{aligned} $$
(16)
$$ J_K(\pi_K) = \min_{a \in {\mathbb{A}}} \sum_{x \in {\mathbb{X}}} \sum_{x' \in {\mathbb{X}}} \lambda_{x'}(K) A_{x',x} \left[ \sum_{j=e_1}^{e_S} B(y_K, j, a_K) D(R, p(x_K, a_K), \beta(K)) \right]. $$
(17)

The value function of an unconstrained POMDP with finite action space is piecewise-linear convex and can be solved using linear programming techniques [32]. An excellent overview of computationally efficient algorithms are given in [23] and can be used to solve for the optimum sensing policy. In general, the number of linear segments that characterize the value function can grow exponentially. In 1991, Lovejoy proposed an ingenious suboptimal algorithm for POMDPs [33]. Based on Lovejoy’s algorithm, the value function can be upper and lower bounded and efficient suboptimal solutions can be developed as in subsection V-D of [34]. By considering only a subset of the piecewise linear segments that characterize the value function and discarding the other segments, one can reduce the computational complexity. Due to the space limitation, please refer to subsection V-D of [34] for details. Moreover, solving the POMDP can be done off-line during system initialization. During the real-time multimedia transmission, a node just needs to find the value for specific information state according to (16) and update the information state according to (10), which introduces little computational complexity. Finally, by imposing structural assumptions on the transition probabilities, cost and observation probabilities, one can prove in some cases that the optimal policy is a threshold policy [35].

3.7 Intra refreshing strategy

For a selected channel, the optimum β selected corresponds to the most likely available state based on π k . Due to the asymptotic nature of the channel distortion, a busy or unaccessed channel has infinite distortion. In this case, β has no influence on the total distortion. If the most likely state based on π k corresponds to a busy state then the optimum β is to select a β corresponding to the most likely available state. That way if the information state suggests the channel is busy but in reality it is available then a β has been selected that will minimize the effect of this error.

4 Simulation results and discussions

In order to evaluate the performance of our proposed scheme, we have carried out a set of simulation experiments using the ns-2 simulator. All simulations were run on a computer equipped with Window 7, Intel Core 2 Duo P8400 CPU (2.26 Ghz) and 4GB memory. The choice for the total time slot number K in the dynamic programming depends on the convergence rate of the POMDP program. State transition probabilities, observation probabilities and value functions have effects on the convergence rate [23, 32]. In our simulations, the POMDP program was run over a horizon of K = 200. It is reasonable to use K = 200 to approximate the problem with infinite horizon. We first consider a system with one channel in Sects. 4.14.3. Then, we consider a system with two channels in Sects. 4.44.5. In all figures the curves represent the average values, while the error bars represent the confidence intervals for 95 percent confidence for 50 different instances (seeds).

We consider the system performance in the following four cases: (1) using perfect knowledge of the system thus making optimal decisions, which is the best case possible, (2) making decisions based on the most likely state indicated by the information state, which is our proposed scheme, (3) making decisions solely based on the channel gain provided in the last acknowledgment, and (4) using a constant β, which represents existing schemes that do not consider application layer QoS. Our goal is to compare the distortion of different schemes as opposed to determining the absolute distortion. We use an average distortion metric that refers to the average distortion over the time slots when the channel is available and accessed. Video rate-distortion parameters remain constant for the duration of the simulation. The same distortion parameters are used for all simulations. D s (R s ,0) = 74. D s (R s ,1) = 124. η = 1.4. a = 0.01. b = 1.0. E[F d (mm − 1)] = 100.

4.1 Performance improvement

Figure 3 shows the distortion of different schemes. The number of states refers to S − 1 quantized channel gains and one busy channel state. For simplicity we derive a transition matrix based on the probability that any available state stays in the same state, Pr{X k+1 = v|X k  = v}, the probability of transitioning from an available state to a busy state, Pr{X k+1 = z|X k  = v}, and the probability of a busy state staying busy, Pr{X k+1 = z|X k  = z}, \(\forall v \in \{e_1, e_2, \ldots, e_{S-1}\}, z = e_S\), where v and z indicate available and busy states, respectively. The following parameter values are used in this example. Pr{X k+1 = v|X k  = v} = 0.85, Pr{X k+1 = z|X k  = v} = 0.05, Pr{X k+1 = z|X k  = z} = 0.1, \(\epsilon\) = 0.6, σ = 0.1. From Fig. 3, we can see that when perfect knowledge of the channel state is available, perfect decisions can be made for each time slot thus method (1) has the lowest average distortion. The more realistic cases occur in the presence of sensing and CSI errors. Our proposed method (i.e., method 2) uses the information state to select the most likely optimal decisions. This method tracks the ideal case fairly closely. Both method 3 and method 4 have worse performance compared to the proposed scheme. This illustrates the performance improvement of the proposed scheme over existing schemes. In addition, we also notice that using a constant β (i.e., method 4) can be worse than making decisions based solely on the previous acknowledgment (i.e., method 3), which shows the need to consider application layer parameters and application QoS. Moreover, increasing the number of channel states changes the characteristics of the channel. Consequently, the likelihood that the underlying system is in a state where the constant β is optimal decreases. Therefore, the performance of using the constant β is not stable with the increasing of the number of states. The application layer parameter, β, should be adapted together with parameters at low layers.

Fig. 3
figure 3

Average distortion vs. the number of states in different schemes

4.2 Effects of the parameters in the state transition matrix

We evaluate how the parameters in the transition matrix affect the average distortion. The transition matrix can be selected based on channel fading and primary usage. We ignore quantization errors caused by the limited number of states and assume the actual channel gain matches the state channel gain. Figures 4 and 5 show the simulation results across Pr{X k+1 = v|X k  = v} and Pr{X k+1 = z|X k  = v}, respectively. In Fig. 4, there are 5 states. \(\epsilon\) = 0.6. Pr{X k+1 = z|X k  = v} = 0.05. This example demonstrates the cognitive nature of the system. Our proposed method (i.e., method 2) approaches the method of using perfect knowledge of the channel state as Pr{X k+1 = v|X k  = v} approaches 1. That is, the performance improves as the system dynamics slows down since it is easier to predict the actual system state. 5 states are used in Fig. 5. \(\epsilon\) = 0.6. Pr{X k+1 = v|X k  = v} = 0.50. From this figure, we can see that Pr{X k+1 = z|X k  = v} has little impact to the performance of the proposed method. The reason for this observation is that increasing Pr{X k+1 = z|X k  = v} will increase the likelihood the system transitions to the busy state, which has little affect on the average distortion when the channel is available and accessed.

Fig. 4
figure 4

Average distortion vs. the probability of staying in the same state

Fig. 5
figure 5

Average distortion vs. the probability of transitioning to the busy state

4.3 Effects of the parameters in the observation matrix

The observation matrix is derived from the sensor operating point, \(\epsilon\), and the standard deviation of the receiver channel estimation error, σ. Figures 6 and 7 show how σ and \(\epsilon\) affect the average distortion. The following parameters are used in Fig. 6. There are 5 states, \(\epsilon\) = 0.6, Pr{X k+1 = v|X k  = v} = 0.85, and Pr{X k+1 = z|X k  = v} = 0.05. We can see from Fig. 6, as the receiver estimation degrades, the acknowledgment provides less information on the actual channel gain and the average distortion of our method increases. \(\epsilon\) and δ are related based on the sensor ROC, and adjusting \(\epsilon\) implies a change to the system probability of collision requirement. In Fig. 7, Pr{X k+1 = v|X k  = v} = 0.85, Pr{X k+1 = z|X k  = v} = 0.05, and σ = 0.1. This figure shows that the average distortion increases as the probability of false alarm increases.

Fig. 6
figure 6

Average distortion vs. the receiver channel estimation standard deviation, σ

Fig. 7
figure 7

Average distortion vs. the sensor operating point, \(\epsilon\)

4.4 Effects of the transition matrix on channel selection policy

We consider a system with N = 2 channels and S = 3 states to evaluate the performance of the channel selection policy. We will use a spectrum utilization (SU) metric to evaluate the sensor policy performance. SU represents the percentage of time slots where an available channel was selected for sensing. SU is an important parameter when evaluating video QoS. The channel distortion is infinite when a channel is busy or not accessed. Improving the SU will reduce the percentage of time slots where a busy channel was selected for sensing thus improving the application layer QoS. The application layer QoS is improved using a two step process. First we select a channel to maximize SU thus reducing the large distortion introduced when the channel is unavailable. Second for an available and accessed channel, we select the intra refreshing rate to minimize distortion for a particular channel gain.

The two channels, channel 1 and channel 2, are simulated having the same number of states (i.e. quantized channel gains) and observation probabilities but asymmetric transition probabilities. Channel 2 will have a higher primary usage than channel 1. Based on previous observations, actions, and the POMDP derived policy the secondary transmitter/receiver pair dynamically selects the channel that will most likely maximize application layer QoS.

We evaluate SU and average distortion performance for three cases (1) POMDP channel selection, which is our proposed scheme, (2) randomly selecting channel 1 or 2 and using a constant β = 0.1, which represents a non-adaptive scheme, and (3) using perfect knowledge of the system state, which represents the ideal case.

SU performance with varying transition matrix parameters is shown in Figs. 8 and 9. In both plots we only vary the transition matrix parameters of channel 1. Both channels have equal observation matrix parameters \(\epsilon\) 1 = \(\epsilon\) 2 = 0.62 and σ1 = σ2 = 0.1.

Fig. 8
figure 8

Two Channel Scenario: spectrum utilization vs. the probability of staying in the busy state of channel 1

In Fig. 8 we vary the probability channel 1 stays busy, \(\hbox{Pr}\{X_{k+1}^1 =z|X_k^1 =z\}\). \(\hbox{Pr}\{X_{k+1}^1 =z|X_k^1 =v\} =0.2\). \(\hbox{Pr}\{X_{k+1}^2 =z|X_k^2 =z\} = 0.8\). \(\hbox{Pr}\{X_{k+1}^2 =z|X_k^2 =v\} =0.6\). In Fig. 9 we vary the probability channel 1 transitions to the busy state, \(\hbox{Pr}\{X_{k+1}^1 =z|X_k^1 =v\}\). \(\hbox{Pr}\{X_{k+1}^1 = z|X_k^1 = z\} = 0.4\). \(\hbox{Pr}\{X_{k+1}^2 = z|X_k^2 = z\} = 0.8\). \(\hbox{Pr}\{X_{k+1}^2 = z|X_k^2 = v\} = 0.6\).

Fig. 9
figure 9

Two Channel Scenario: spectrum utilization vs. the probability of transitioning to the busy state of channel 1

In both cases, the SU utilization of our scheme is greater than the non-adaptive scheme. Our proposed scheme senses the surrounding environment to learn and adapt channel selection. However it takes several time slots for the policy to learn the system state thus the performance of our scheme improves with slower transition dynamics. That is, our scheme approaches the perfect case as \(\hbox{Pr}\{X_{k+1}^1 = z|X_k^1 = v\}\) approaches 0 as is shown in Fig. 9. Our scheme provides closer to optimal performance when there is a large difference in channel availability between the two channels as it becomes easier to distinguish the better channel. This is demonstrated in Fig. 8 where the performance of our scheme is more optimal at low \(\hbox{Pr}\{X_{k+1}^1 = z | X_k^1 = z\}\) relative to \(\hbox{Pr}\{X_{k+1}^2 = z | X_k^2 = z\}\). In Fig. 10, we show the average distortion for the probability channel 1 stays busy. The average distortion of our scheme is better than the non-adaptive scheme. Transition matrix parameters have little affect to the average distortion. Our scheme outperforms the non-adaptive scheme because our scheme will select the channel with the better channel gain and adapt the intra refreshing rate for the selected channel.

Fig. 10
figure 10

Two Channel Scenario: average distortion vs. the probability of staying in the busy state of channel 1

4.5 Effects of the observation matrix on channel selection policy

SU with varying sensor operating point is shown in Fig. 11. \(\hbox{Pr}\{X_{k+1}^1 = z | X_k^1 = z\} = 0.4\). \(\hbox{Pr}\{X_{k+1}^1 = z | X_k^1 = v\} = 0.15\). \(\hbox{Pr}\{X_{k+1}^2 = z | X_k^2 = z\} = 0.6\). \(\hbox{Pr}\{X_{k+1}^2 = z | X_k^2 = v\} = 0.2\). Observation parameters are derived by operating characteristics of the secondary users and are not likely to be different for each channel. Thus both channels are simulated with symmetrical observation parameters, \(\epsilon\) 1 = \(\epsilon\) 2 = \(\epsilon\) and σ1 = σ2 = σ. In Fig. 11 we vary the spectrum operating point \(\epsilon\). In Fig. 12 we show the average distortion with varying \(\epsilon\). The observation parameters are shown to have little affect on the SU and average distortion performance of our proposed scheme.

Fig. 11
figure 11

Two Channel Scenario: spectrum utilization vs. the receiver channel estimation standard deviation, σ

Fig. 12
figure 12

Two Channel Scenario: average distortion vs. the receiver channel estimation standard deviation, σ

These simulation results demonstrate some interesting trends in the design and optimization of CR networks from a cross-layer design perspective. Adaptively adjusting the intra refreshing rate to accommodate time varying wireless channels is an effective way to reduce distortion. By using all previous actions and observations we can build an information state that becomes more accurate over time. Performance of using the information state to select the intra refreshing rate improves as the system dynamics slows down. In a CR environment the MAC access strategy is derived from the accuracy of the spectrum sensor. The total distortion is limited to the availability of the channel. Distortion performance will degrade if primary usage increases or a very low system tolerable probability of collision is required.

5 Conclusions and future work

In this paper, we have presented an integrated approach for multimedia transmission over cognitive radio networks. An important application layer parameter, intra refreshing rate, can be adjusted together with other parameters at other layers based on the sensed channel condition by the secondary users. A low complexity dynamic programming framework was presented to obtain the optimal intra refreshing policy. By modeling the system as a Markov process, we have derived a POMDP for optimal channel selection to minimize distortion while improving spectrum efficiency. Simulation results demonstrated the performance gain by using the adaptive transmission scheme. Future work is in progress to consider other QoS at the application layer.