1 Introduction

During the last decades, cellular mobile networks have been transitioned from a simple telephony network to a network that can deliver different types of multimedia contents, which require a high frequency bandwidth. In addition, the number of subscribers to wireless mobile networks is in a continuous increase, according to CISCO there will be 11.6 billion mobile connected devices by 2021 [1], which will trigger significant demand for more spectrum resources.

1.1 Spectrum sharing: emerging solution for spectrum scarcity

Recent studies show that spectrum shortage is a result of the outdated spectrum management policy rather than physical scarcity. Nowadays, radio spectrum is exclusively and statically allocated to different technologies which left over very little bands for fixed frequency assignments. On the contrary, while frequency bands allocated to certain technologies like that of cellular networks are so crowded, other frequency bands are sporadically utilized (e.g. military and TV bands). These conflicting observations prove that telecom industry is facing the prospects of a looming spectrum crunch. Since the radio spectrum is the lifeblood of the telecommunication market, research community has intensively worked on ways to provide agile frequency allocation policies. Approaches based on spectrum sharing between licensed and non-licensed traffics, also referred to as cognitive radio (CR) [2], are actively pursued today to drive the development of diversified use cases. New players are allowed to use the licensed frequency bands according to a predefined agreement with the incumbent users, provided that the cognitive device, otherwise known as a secondary user (SU), do not cause any harmful interference to the license-holder or the so-called primary user (PU).

Depending on the knowledge that is needed to coexist with the primary network, cognitive radio spectrum access techniques can be implemented according to three different classes: interweave, underlay and overlay [3]. The interweave paradigm is the first motivation for cognitive radio technology, in which SUs continuously monitor the available spectrum bands, detect the unused ones and transmit over these spectrum rooms without disturbing PUs. The underlay spectrum access technique allows SUs to simultaneously transmit with PUs as long as the interference level caused by secondary transmitter at primary receiver side falls under a predefined threshold to ensure a peaceful coexistence. The overlay approach also allows for concurrent primary and secondary transmissions: SUs should have knowledge of the PUs transmitted data sequences and the associated codebooks, and use these information with sophisticated signal processing and coding techniques to maintain the performance of the primary link while obtaining supplementary bandwidth for their own communication. Advanced levels of cognition can be obtained using a combination of the above-cited paradigms.

According to Ericsson ConsumerLab statistics during 2015 [4], only 34% of consumers were satisfied with their indoor connectivity experience when watching TV or streaming videos and even more less (10%) are satisfied with their outdoor connectivity for the same data-intensive activities. These concrete measurements show that the network performance has become a serious bottleneck for telecom operators and is going to be a factor of paramount interest for consumer engagement during the coming years. To bridge the gap between theory and practice and meet the rising customer expectations, last years have witnessed a strong interest in the technical standardization of frequency-agile cellular networks [5]. Spectrum sharing paradigms has been introduced in 4G, namely: LTE unlicensed (LTE-U) and licensed assisted access (LAA), that combine licensed spectrum with shared/unlicensed spectrum like ISM bands, LTE Wi-Fi aggregation (LWA) aggregating across different technologies and combining LTE and WIFI signals, MulteFire that enables high-performance cellular technology to operate stand-alone in unlicensed spectrum and Citizen Broadband Radio Service (CBRS) where multiple deployments can share spectrum with a higher prioritized incumbent. This trend is expected to continue next years with the advent of 5G and beyond-5G networks.

Although millimeter waves have a great potential to support multi-gigabit communications, CR cellular technology can gain access to more spectrum bandwidth that could span hundreds of MHz to many GHz using the dynamic spectrum sharing (DSS) concept. Thus, spectrum sharing-based networks are expected to revolutionize and shape the future of multimedia content delivery as a quality-hungry and spectrum-consuming use case. However, multimedia communications are particularly heavily sensitive to dynamic traffic load and contending secondary devices with heterogeneous capabilities, along with channel failures as a result of the sudden appearance of the license holders. Moreover, the low secondary transmit power regime that protect the payload data of the incumbent users forces the secondary device to operate at lower data transmission rates. Advanced coding strategies are highly recommended in such contexts, there are very few solutions often advocated to deal with such dynamic environments, namely, layered coding and multiple description coding (MDC).

Fig. 1
figure 1

System architecture: \(f_{\mathrm {TV}}\) is a TV channel and \(\left( f_{\mathrm {c}_{i}} \right) _{1\le i \le 4}\) are cellular frequencies in the same TV broadcasting zone

1.2 Problem formulation and solution strategy

This paper investigates how layered and multiple description coding would be an ideal combination, as it is commonly known that layered and MDC schemes are immune to changing and rapidly varying conditions. The network architecture adopted in this paper enables spectrum sharing between a secondary cellular network and a primary Digital Video Broadcasting Terrestrial (DVB-T) system [6]. We consider a cognitive cellular network [7] endowed with the capability of aggregating licensed radio resources (LRRs) and cognitive radio resources (CRRs) into one holistic and unified system at the radio resource management (RRM) module. More importantly and besides the conventional cellular frequencies, the cognitive base station (CBS) can exploit TV white spaces (TVWS) using a combined interweave and underlay strategy without disturbing the TV receivers. TVWS are solicited only when no more free radio resources could be made available to meet the new requests. We inspect a scalable video streaming scenario and we assume that the CBS is transmitting a video content to a cognitive user as depicted in Fig. 1. We adopt the well-known scalable video coding (SVC) codec the extension of H.264/AVC standard [8]. In particular, the original video is split into two descriptions composed of odd and even frames, then the odd segment (the even segment, respectively) is encoded into the base layer (the enhancement layer, respectively) with the inter-layer prediction property switched on. The even and odd frame separation mechanism is capable of transforming a scalable source bit stream into a robust multiple description stream. Meanwhile, coding parameters are optimally adjusted with respect to the environmental conditions and in response to network and application constraints. At the receiving end, the video can be recovered up to a certain quality commensurate with the number of descriptions received. The results are reported in terms of two video quality assessment metrics: peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM). SSIM index [9] has been proposed to improve the conventional methods such as PSNR, it is a perception-based method that considers video degradation as perceived change in structural information. SSIM has been chosen because it is widely recognized as providing objective scores closely aligned with the subjective testing. MS-SSIM represents a big advance over SSIM and extends the technique by making multiple SSIM evaluations at different scales to cover wider types of impairments and losses in the video feed.

The remainder of this paper will be organized as follows. Section 2 consolidates and surveys the relevant and recent literature in this research topic. Section 3 analyzes the system architecture being considered in this study, followed by an analytical formulation of the overall secondary system capacity. Later in Sect. 4, a detailed description of the proposed video coding framework is given and the optimal coding parameters for both base and enhancement layers are computed. Numerical simulations are conducted in Sect. 5 and insightful discussions are reported to corroborate the preceding theoretical analysis. At the end, the paper is concluded in Sect. 6.

2 Related works and background

In recent studies, little research efforts have developed practical proposals to demonstrate the capabilities of MDC framworks in CR contexts.

Kushwaha, Xing, Chandramouli and Subbalakshmi have presented in [10] a comprehensive overview related to coding techniques and their application in a CR context. More importantly, the paper has invoked the MDC concept as a relevant technique to combat transmission failures in CR networks. However, the paper doesn’t provide any numerical results to corroborate the theoretical analysis for the MDC case. In [11], Husheng has investigated the use of MDC for delay-sensitive and distortion-tolerable secondary communications subjected to fluctuations caused by the primary traffic reclaims. The paper proposed an algorithm to formulate the selection of MDC rates and distortions as an optimization problem for a given utility. The image transmission use case has been considered for numerical simulations. The paper has neglected the substantial effects of fading and noise leading to noticeable image degradation. Further, more attractive scenarios (e.g. video) need to be discussed.

The use of MDC schemes in CR networks was deeply debated in [12, 13] and [14].

A specific packetization framework for MDC inspired by the priority encoding transmission (PET) technique of Albanese et al. [15] has been derived in [12] and the PSNR formula of the finale stream has been computed and optimized as a function of the loss pattern. The paper [13] concentrated on a recurring problem in cognitive cellular infrastructures which is the non-reciprocity between the uplink and the downlink in interfering with the primary network and also in the mutual collisions. The paper argued that the spectral efficiency can be enhanced thanks to the use of MDC. A general cross layer model is proposed in [14] to ensure end-to-end multimedia content delivery in realistic cognitive radio contexts, characterized by random licensed users return and channel impairments as well. A recent work [16] implements an unequal power allocation mechanism in an IEEE 802.11a context, the coded bits with the largest contribution to the overall image quality are granted higher power levels. The optimal solution is obtained by minimizing the image distortion while properly complying with the interference constraint of the primary system. Again, these studies tackled the intuitive case of image transmission. A broad survey of network coding schemes in cognitive radio networks has been provided in [17], pointing out the motivations for and the corresponding applications, particularly the paper has invoked the MDC coding as an efficient mean to enhance the spectral efficiency in CR networks. Adaptive rateless coding has been employed to increase the throughput of the secondary communication in a multi-channel multi-user cognitive radio network under the assumption of some input data packets are known at the decoder [18]. The impact of the proposed scheme on the perceived quality of the received file has not been examined for multimedia communications which are highly loss-sensitive. An in-depth study of multimedia services over various kinds of CR-based wireless networks is presented in [19], the survey outlines the state-of-the-art research on design requirements and discusses some key questions regarding the support and the performance of multimedia applications in such dynamic environments. However, MDC-based designs are not considered.

The video transmission scenario has been tackled the first time in [20] with a special emphasis on applications requiring a timely delivery of information with a tolerable amount of distortion in the interweave mode. The paper tries to answer the pertinent question of whether the MDC will always perform better than the standard equal loss protection schemes in CR networks. A PET-based MDC design has been adopted in 60GHz networks to stream uncompressed high definition (HD) videos [21], wherein different bits of each pixel are optimally partitioned according to their contribution in the overall quality. Interleaving is a basic component in this scheme to combat the bursty errors, which is a time-consuming mechanism that influence the latency of the multimedia stream. Scalable video streaming over underlay CR networks has been studied in [22], a proposed MDC scheme relying on the separation between even and odd frames has been employed to enhance the error resilience of the conveyed video base layer. The paper carries out a performance comparison between both single and multiple description coding through assessing the average PSNR.

As evidenced by the previous works on the topic, these state-of-the-art papers classify the proposed approach of cognitive radio according to simple sharing methods: interweave or underlay and don’t leverage the benefits of mixed spectrum sharing strategies, along with a general tendency to omit the effects of fading and noise across the wireless medium while prioritizing the impact of the primary traffic interruptions. Furthermore, the most of the aforementioned papers consider the PSNR as the video quality performance measure, PSNR has very week correlation with subjective and human vision.

3 System description

3.1 General analysis

5G and beyond-5G is certainly the generation of higher data rates and selfish adaptation to surrounding environment, as well as an infinite number of ubiquitously connected devices with low latency, energy savings and reduced complexity. The unlicensed and shared access to spectrum is among the potential candidates to be implemented in the context of next generation mobile networks.

TVWS carriers released due to the digital switch-over open up a new way of spectrum deregulation so that some players like 5G operators, referred to as secondary users, can be scheduled to transmit on the vacant TV channels, otherwise known as primary users, in a non-interfering mode. On this basis, different telecommunication stockholders can restore margins and earn revenues, decreasing CAPEX and OPEX costs as well.

TV white spaces are exploited to increase the pool of available radio resource blocks (RRBs) to ensure continuous service provision in future cellular networks even when the licensed resources are completely filled. Such functionality can be handled by an evolved RRM entity in the new 5G equipment to integrate LRRs and CRRs allocation into one holistic stack. Additional frequency-agile capabilities should be incorporated into RRM frameworks.

The keen interest is such architecture can be substantiated by multiple motivations essentially related the extra-capacity offered for the upcoming 5G cellular networks and the opportunity to cease for new services and potential businesses: (1) the transition of TV from analog to digital broadcasting has freed up large spectrum chunks left unused, around 100 MHz of spectrum depending on the country, (2) DVB-T network is characterized by a frequency reuse pattern, which makes some spectrum bands unused in some specific locations according to the reuse factor (larger than one), so a cognitive base station located far away from the TV transmitter can reuse the associated frequency band without affecting the TV receivers, (3) Digital TV broadcasting is characterized by a very high transmission power compared to that used in cellular network, which makes the underlay cognitive transmission mode a technically possible scenario very suitable for such contexts.

In this article, we consider a CBS transmitting a scalable video content to a cognitive user, as depicted in Fig. 1. We assume that the CBS accesses the CRRs only if the available LRRs are not sufficient to deliver all the video layers, namely, the base layer and the enhancement layers.

The proposed CCN employs a combination of spectrum sensing and geographic coordination through a central database to obtain awareness about the spectrum occupancy map over multiple dimensions like time and space to be able to detect and access the vacant frequency channels in TV bands. Hence, the CBS can determine the unused channels (white spaces) in a specific service area and transmits on using the interweave approach. Alternatively, if no vacant channel is detected the system switches to the underlay transmission mode and exploits the underutilized spectrum bands. We assume the risk that the primary user resumes the broadcasting activity during the transmission period \((1-\alpha )T\) is very negligible, so there is no prematurely terminated cognitive sessions. Figure 2 represents the hybrid transmission scheme used in this paper.

Fig. 2
figure 2

Hybrid interweave and underlay transmission scheme

3.2 Extended capacity for secondary communications

A wireless multimedia communication between a cognitive base station ST and a cognitive device SR is taking place within the range of a nearby primary TV transmitter-receiver pair (PT and PR) according to the proposed spectrum sharing strategy (Fig. 2), as illustrated in Fig. 3. A set of contigous and/or noncontiguous channels scattered over cellular and TV spectrum has been selected to constitute a common pool, with \(W_{\mathrm {p}}\) (\(W_{\mathrm {s}}\), respectively) is the TV (cellular, respectively) channel bandwidth. The cellular transmission divides time into fixed size frames of duration T. When the cellular carriers become congested, each frame will be partitioned into two parts: a sensing phase and a transmission phase, with a normalized duration \(\alpha \) and \(1-\alpha \), respectively (Fig. 2). Physical obstructions to signal propagation principally buildings and hells force various network nodes to operate in non-line-of-sight (NLOS) conditions, thus all the wireless channels in this environment can be considered as Rayleigh distributed with a frequency flat block fading process. We assume also that both the secondary transmitter and the secondary receiver are contaminated by an additive white Gaussian noise (AWGN), with variance \(N_{\mathrm {s}}\). The CBS possesses a perfect knowledge of the channel state information (CSI).

Fig. 3
figure 3

Cognitive radio channel model

Denote the channel gain from the primary transmitter to the primary receiver as \(g_{\mathrm {PP}}\). Let \(g_{\mathrm {TT}}\) be the channel gain between the primary transmitter and the secondary transmitter, \(g_{\mathrm {SP}}\) is the channel gain between the secondary transmitter and the primary receiver and \(g_{\mathrm {PS}}\) is the channel gain between the primary transmitter and the secondary receiver. The channel gain of the secondary link \(\mathrm {ST}-\mathrm {SR}\) is denoted as \(g_{\mathrm {SS}}\) (see Fig. 3). The TV transmitter is located at a distance \(d_{\mathrm {TT}} \approx d_{\mathrm {PS}} \approx d_{\mathrm {PP}}\) from the nearest cellular cell, \(d_{\mathrm {SP}}\) is the distance from the base station to nearby TV receiver and the distance between the secondary transmitter-receiver pair is \(d_{\mathrm {SS}}\). By incorporating the distance-dependent path loss, channel coefficients can be modeled as \(g_{\mathrm {xy}} = d_{\mathrm {xy}}^{-\gamma /2}h_{\mathrm {xy}}\), where \(h_{\mathrm {xy}} \sim \mathcal {CN}(0,\,\sigma )\) for \(xy \in \lbrace \small \mathrm {TT}, \small \mathrm {PP}, \small \mathrm {SS}, \small \mathrm {SP}, \small \mathrm {PS}\rbrace \) and \(\gamma \) is the path loss exponent. We use the notation |.| for the magnitude value.

We define \(C^{\text {I}}_{\text {F}}\) (\(C^{\text {I}}_{\text {B}}\), respectively) as the capacity of the interweave cognitive access in the case of free (busy, respectively) channel state. Likewise, we use the following notations: \(C^{\text {U}}_{\text {F}}\) and \(C^{\text {U}}_{\text {B}}\) to denote the channel capacities for the underlay mode.

We use the Shannon capacity formula to find the average channel capacity. \(C^{\text {I}}_{\text {F}}\) and \(C^{\text {I}}_{\text {B}}\) can be expressed as the following:

$$\begin{aligned} C^{\text {I}}_{\text {F}}= & {} W_{\mathrm {p}} \log _{2}\left( 1 + \frac{\vert g_{\mathrm {SS}} \vert ^2 P^{I}_{s}}{N_{\mathrm {s}}}\right) \end{aligned}$$
(1)
$$\begin{aligned} C^{\text {I}}_{\text {B}}= & {} W_{\mathrm {p}} \log _{2}\left( 1 + \frac{\vert g_{\mathrm {SS}} \vert ^2 P^{I}_{s}}{N_{\mathrm {s}} + \vert g_{\mathrm {PS}} \vert ^2 P_p}\right) \end{aligned}$$
(2)

where \(P_{p}\) and \(P^{I}_{s}\) are the transmit power of the primary user and the secondary user during the interweave transmission mode, respectively. It is worth noting that as a practical matter, we consider that the primary link in vicinity is the prominent interferer and thus we omit the co-channel and adjacent-channel interference of the surrounding cognitive cells.

Similarly, \(C^{\text {U}}_{\text {F}}\) and \(C^{\text {U}}_{\text {B}}\) can be derived as:

$$\begin{aligned} C^{\text {U}}_{\text {F}}= & {} W_{\mathrm {p}} \log _{2}\left( 1 + \frac{\vert g_{\mathrm {SS}} \vert ^2 P^{U}_{s}}{N_{\mathrm {s}}}\right) , \nonumber \\ P^{U}_{s}= & {} \mathrm {min}\left( P_{s}, \frac{I}{\vert g_{\mathrm {SP}} \vert ^2}\right) \end{aligned}$$
(3)
$$\begin{aligned} C^{\text {U}}_{\text {B}}= & {} W_{\mathrm {p}} \log _{2}\left( 1 + \frac{\vert g_{\mathrm {SS}} \vert ^2 P^{U}_{s}}{N_{\mathrm {s}} + \vert g_{\mathrm {PS}} \vert ^2 P_{p} }\right) , \nonumber \\ P^{U}_{s}= & {} \mathrm {min}\left( P_{s}, \frac{I}{\vert g_{\mathrm {SP}} \vert ^2}\right) \end{aligned}$$
(4)

where I is the interference threshold defined at the primary receiver. \(P_{s}\) refers to the instantaneous secondary transmit power subjected to interference constraint and \(P^{U}_{s}\) is the transmitted power by the cognitive base station in the underlay regime.

Fig. 4
figure 4

Schematic diagram of the overall secondary capacity

Although, software defined radio (SDR) has recently attained an advanced level of intelligent and accurate sensing, the fact remains that sensing results are considered as not totally trustworthy. Two critical parameters define the imperfectness of the sensing process, namely, false alarm \(P_{\text {fa}}\) and missed detection \(P_{\text {m}}\).

The secondary device switches in time between interweave and underlay modes contingent on the outcome of the sensing operation. It can freely transmit over the selected frequency using the interweave approach provided that either the associated primary traffic is still far away or a missed detection has happened. Otherwise, the secondary device chooses to underlay its signal to ensure that the secondary communication is not detrimental to the licensed link. A graphical representation to model the behavior of the secondary system is shown in Fig. 4. Hence, the overall capacity of the secondary system, denoted as \(C_{\mathrm {T}}\), can be represented as:

$$\begin{aligned} C_{\text {T}} = (1 - P^{\text {LTE}}_{\text {B}}) C_{\text {LTE}} + (1-\alpha ) P^{\text {LTE}}_{\text {B}} (1 - P^{\text {TV}}_{\text {B}}) C_{\text {TV}} \end{aligned}$$
(5)

\(C_{\text {LTE}}\) denotes the licensed channels capacity and \(C_{\text {TV}}\) is the extra capacity growth due to the reuse of TV channels.

The overall capacity of the secondary system is a weighted linear combination of the capacities obtained in Eqs. (1), (2), (3) and (4) where each weight is a function of the blocking probability, the probability of having a TV spectrum hole and the probability of a false alarm or a missed detection (Fig. 4), and takes the form:

$$\begin{aligned}&C_{\text {T}} \nonumber \\&\quad = (1 - P^{\text {LTE}}_{\text {B}}) C_{\text {LTE}} + (1-\alpha ) P^{\text {LTE}}_{\text {B}} (1 - P^{\text {TV}}_{\text {B}}) \nonumber \\&\qquad \times \left[ (1- P_{\text {a}}) (1 - P_{\text {fa}}) C^{\text {I}}_{\text {F}} + P_{\text {a}} P_{\text {m}}C^{\text {I}}_{\text {B}}\right] \nonumber \\&\qquad + (1-\alpha ) P^{\text {LTE}}_{\text {B}} (1 - P^{\text {TV}}_{\text {B}}) \nonumber \\&\qquad \times \left[ ( 1 - P_{\text {a}}) P_{\text {fa}} C^{\text {U}}_{\text {F}}+ P_{\text {a}}(1 - P_{\text {m}}) C^{\text {U}}_{\text {B}} \right] \end{aligned}$$
(6)

\(P_{\text {a}}\) corresponds to the probability that the TV channel is busy with respect to a Bernoulli distribution (Eq.(7)). \(P^{\text {LTE}}_{\text {B}}\) is the blocking probability for legacy carriers, which reflects the probability that the LRRs of the cellular infrastructure are not sufficient to deliver all the video layers (Eq.(8)). Equivalently, \(P^{\text {TV}}_{\text {B}}\) is the blocking probability related to TV spectrum holes (Eq.(10)).

In the United States, television broadcasting operates over roughly 50 channels with a frequency spacing of 6 MHz (54 to 60 MHz, TV channel 2 \(\vert \) 76. to 88 MHz, TV channels 5 and 6 \(\vert \) 174 to 216 MHz, TV channels 7 to 13 \(\vert \) 470 to 608 MHz, TV channels 14 to 36 \(\vert \) and 614 to 698 MHz, TV channels 38 to 51) geographically scattered throughout the country. We consider a given location which is assigned a set of n frequencies based on a predefined frequency planning [23]. Accordingly:

$$\begin{aligned} P_{\text {a}} = \frac{n}{50} \end{aligned}$$
(7)

3.3 Traffic blocking in cognitive cellular networks

The availability of cellular radio resources can not be ensured permanently as these carriers are naturally finite and shared among thousands of users.

In this paper, we consider an omnidirectional cell pattern and we assume that each base station allows queuing of cellular connections and accordingly can be interpreted as a queuing system. More precisely, mobile devices are considered as the clients, the arrivals denote the incoming call requests, the departures describe the call disconnections and the available frequencies operate as the servers. Hence, each cell can be modeled as a chain of two independent and sequentially arranged M / M / c / c queuing systems, wherein each queue denotes a system with a pre-fixed number of servers, Poisson arrivals and Exponentially distributed service time with no waiting and no traffic return. The first queue manages the legacy frequencies, whereas the TV channels are monitored by the second queue. An important parameter characterizing such systems is the blocking probability which is the probability that there is not enough radio resources to satisfy the new call attempts.

In this study, the blocking probability of the first queue corresponds to the case where the cell becomes overcrowded and the mobile connectivity may switch to an underutilized or a free TV channel instead of being forced to wait or to drop as in legacy cellular networks. The RRM block will supervise and monitor this key parameter and subsequently assign the radio resources from a joint pool composed of cellular and TVWS frequencies in a dynamic fashion, provided that the TV carriers are only attributed to incoming requests when the cellular channels become exhausted.

In such systems, the blocking probability can be computed as the following:

$$\begin{aligned} P^{\text {LTE}}_{\text {B}} = \frac{ \left( \frac{\lambda }{\mu }\right) ^{S_{\text {LTE}}}/S_{\scriptscriptstyle \text {LTE}}!}{\sum _{k=0}^{S_{\scriptscriptstyle \text {LTE}}} \left( \frac{\lambda }{\mu }\right) ^{k}/k!} \end{aligned}$$
(8)

where \(\lambda \) denotes the mean call arrival rate, \(1/\mu \) is the mean service time and \(S_{\text {LTE}}\) is the number of the available parallel servers.

To connect to a given cell and be able to make calls or transfer data, any new entrant needs to execute the cell search procedure first, during which two successive synchronization signals (i.e. PSS and SSS) must be detected. Each signal is transmitted using \(N_{\mathrm {SYN}}=6\) RRBs. As a result, the number of servers in the base station is equivalent to the maximal number of cellular connections that can be simultaneously correctly established, and corresponds to:

$$\begin{aligned} S_{\mathrm {\small LTE}} = \frac{N_{\mathrm {LTE}}}{N_{\mathrm {SYN}}} \end{aligned}$$
(9)

\(N_{\mathrm {CELL}}\) is the total number of RRBs per cognitive base station and depends upon the cellular channel bandwidth (e.g. \(N_{\mathrm {LTE}}=100\) for \(W_{\mathrm {s}}=20~\mathrm {MHz}\)).

Once the licensed carriers become fully occupied, the new traffic demands are redirected to a second queue that manages the TV resources. In this case, each new entrant get blocked and rejected from the system if no more TV channels are available. Assuming that traffic characteristics remain unchanged meanwhile, the blocking probability of the TV resources queue is easily inferred.

$$\begin{aligned} P^{\text {TV}}_{\text {B}} = \frac{ \left( \frac{\lambda }{\mu }\right) ^{S_{\text {TV}}}/S_{\text {TV}}!}{\sum _{k=0}^{S_{\text {TV}}} \left( \frac{\lambda }{\mu }\right) ^{k}/k!} \end{aligned}$$
(10)

The number of servers \(S_{\text {TV}}\) depends upon the number of idle TV channels in that geographic area and is given by:

$$\begin{aligned} S_{\mathrm {\small TV}} = \frac{N_{\mathrm {TV}}}{N_{\mathrm {SYN}}} \end{aligned}$$
(11)

\(N_{\mathrm {TV}}\) is the total number of RRBs offered by the idle TV channels (e.g. \(N_{\mathrm {TV}}=25\times (50 - n)\) for \(W_{\mathrm {p}}=6~\mathrm {MHz}\), n has been defined in Eq.(7)).

3.4 Unreliable spectrum sensing

According to [24], the false alarm and missed detection probabilities over a Rayleigh fading channel can be written as:

$$\begin{aligned} P_{\text {fa}} = \frac{\varGamma \left( \frac{N}{2}, \frac{\lambda _{th}}{2}\right) }{\varGamma \left( \frac{N}{2}\right) } \end{aligned}$$
(12)

and

$$\begin{aligned} P_{\text {m}}= & {} 1 - \exp \left( - \frac{\lambda _{th}}{2}\right) \sum _{i=0}^{\frac{N}{2}-2} \frac{\left( \frac{\lambda _{th}}{2}\right) ^{i}}{i!}\nonumber \\&- {\left( \frac{1 + \bar{\gamma }}{\bar{\gamma }}\right) }^{\frac{N}{2}-1} \exp \left( -\frac{\lambda _{th}}{2(1+\bar{\gamma })}\right) \nonumber \\&+ {\left( \frac{1 + \bar{\gamma }}{\bar{\gamma }}\right) }^{\frac{N}{2}-1} \exp \left( -\frac{\lambda _{th}}{2}\right) \sum _{i=0}^{\frac{N}{2}-2}\frac{1}{i!}{\left( \frac{\lambda _{th} \bar{\gamma }}{2(1+\bar{\gamma })}\right) }^{i}\nonumber \\ \end{aligned}$$
(13)

with \(\varGamma (.)\) and \(\varGamma (.,.)\) are the Gamma function and the incomplete Gamma function, respectively. \(N = 2\alpha W_{\mathrm {p}}T\) is the number of samples of the discrete primary signal, where \(W_{\mathrm {p}}\) is the TV channel bandwidth and \(\alpha T\) is the sensing time duration, \(\lambda _{th}\) denotes the decision threshold, and \(\bar{\gamma }\) is the average SNR of the primary signal activity perceived at the secondary sensing device. The latter parameter can be calculated as:

$$\begin{aligned} \bar{\gamma } = \frac{P_{\mathrm {p}} \vert g_{\mathrm {TT}} \vert ^2}{N_{\mathrm {s}}} \end{aligned}$$
(14)

Plugging Eqs. (7), (8), (10), (12) and (13) into Eq.(6), the overall capacity \(C_{\text {T}}\) is entirely defined.

Fig. 5
figure 5

A general diagram for error propagation in one description

4 Proposed video coding scheme

The integration of multimedia in recent systems has allowed for more interactive exchange and interpretation of information, owing to the portability of multimedia files and the large amount of information they provide. However, rich contents like video are tricky to send over the network. Video streaming over wireless networks faces many challenges such as the variation of transmission bandwidth, jitter, delay and high packet loss rates. These problems are further complicated in CRN as the available transmission bandwidth is scarce and dynamically varying as well, and each secondary user is experiencing interference from both the primary and the contending secondary peers.

Scalable video codecs create dependent layers and provide scalability. Unfortunately, such schemes may be willing to allow a slight reduction in compression efficiency compared to the non-scalable version of the same codec regarding the fact that the scalability feature introduces more redundancy. MDC is very suitable for a communication network with multiple paths. Each description of MDC is independently decodable and thus can overcome the problem of path failures. MDC descriptions may sacrifice further compression efficiency to be self-decodable. Designing robust MDC schemes may also incur additional complexity mainly due to how the original data has been partitioned into various descriptions and also how many descriptions generated. A combination of MDC and SVC schemes can further increase adaptation to client capabilities and bandwidth heterogeneity. Devices can exploit path diversity and may tune in to the bit-stream that best fits their characteristics, at the cost of less compression efficiency and more coding complexity. A multiple description scalable coding framework needs to carefully balance the trade-off between scalability, path diversity, bit-rate overhead and complexity as these aspects are strongly coupled. The impact of this trade-off on the overall performance of the proposed system is not quantified in this study.

Throughout the paper, we will use a joint multiple description and layered coding (MDSC) technique [25], which combines multiple description coding (MDC) and SVC merits. Designing a mix of SVC and MDC coding schemes allows to adapt the transmitted video bit stream to network conditions, traffic load and terminal capabilities. The proposed transmission scheme can only partially provide basic error resilience in some specific cases. On one hand, I-frames serve as reference frames and thus can be used to stop error propagation from P-frames. On the other hand, even and odd frames separation can also help mitigating the packet losses. If an error occurs, on the way, in one of the frames in the odd description for example the error will propagate to the next odd frame and so on. The proposed MDSC decoder can recover the original stream at full temporal resolution using the even and odd frames received prior to the error occurrence then it can switch to the even stream and display the sequence at a reduced frame rate or can employ some interpolation mechanisms to predict the erroneous odd frames and produce a full temporal resolution, as shown in Fig. 5. The missing frames can be simply interpolated as: \(f^{miss}_{odd_{i}}=\dfrac{f_{even_{i-1}}+f_{even_{i+1}}}{2}\). Concretely, the input stream needs to be separated into two sub-sequences: even and odd frames, and the proposed MDSC encoder may be composed of two parallel conventional SVC encoders (Fig. 6). An other alternative can be designing a single SVC encoder with the capability of storing the last two previously encoded frames, thereby the encoder could select the appropriate past frame as a reference and predictor for the future input frame to be processed (\(f_{odd_{i-2}} \curvearrowright f_{odd_{i}}\) or \(f_{even_{i-2}} \curvearrowright f_{even_{i}}\)) [26]. The choice of the even and odd separation is motivated by its low complexity so as to match the limited computational power of mobile terminals, in addition to the complimentary properties of even and odd frames as they are temporally adjacent. The video feed is transported over the User Datagram Protocol (UDP), providing a basic transport service and thus allowing for reduced transmission deadline and great flexibility at the application layer for advanced services.

Fig. 6
figure 6

Joint H.264/SVC-MDC video coding scheme

The MAC layer receives the channel properties provided by the physical layer and the traffic parameters announced by the RRM scheduler. Next, the optimal overall capacity \(C_{\text {T}}\) can be computed according to Eq.(6). The system capacity is the major limiting factor for the source requirement in terms of video file size to be transmitted. Taking into consideration the source characteristics, typically, a frame rate \(N_{\mathrm {f}}\), a coding bit-rate \(b_{\mathrm {f}}\) and a total number of video frames \(L_{\mathrm {f}}\) and assuming a strict transmission deadline \(T_{\mathrm {f}}\), we identify the following constraint:

$$\begin{aligned} \frac{b_{\mathrm {f}} \times L_{\mathrm {f}}}{N_{\mathrm {f}}} \le C_{\text {T}} \times T_{\mathrm {f}} \end{aligned}$$
(15)

The inequality above determines an upper bound of the coding bit-rate, as follows:

$$\begin{aligned} b_{\mathrm {f}} \le \frac{C_{\text {T}} \times T_{\mathrm {f}} \times N_{\mathrm {f}}}{L_{\mathrm {f}}} \end{aligned}$$
(16)

Likewise, the coding bit-rate \(b_{\mathrm {l}}\) in the case of the legacy cellular network operating in a non-sharing mode is capped by the Shannon limit and satisfies the following constraint:

$$\begin{aligned} b_{\mathrm {l}} \le \frac{(1 - P^{\text {LTE}}_{\text {B}}) \times C_{\text {LTE}} \times T_{\mathrm {f}} \times N_{\mathrm {f}}}{L_{\mathrm {f}}} \end{aligned}$$
(17)

The capacity \(C_{\text {T}}\) can be calculated offline at the CBS to achieve computational time saving.

The central idea is to be able to deliver a bitstream automatically adapted to the CBS capabilities in terms of the limited availability of radio resources. Therefore, our MDSC encoder will generate two levels of video quality: a low-quality stream \(b_{\mathrm {l}}\) used whenever no TV spectrum holes are available or for backward compatibility with previous mobile generations, and a full-quality stream \(b_{\mathrm {f}}\) when the CBS can afford using up more physical resources, particularly, TVWS.

The target coding bit-rate is directly related to the encoder parameters, in particular, the quantization parameter Q. This relationship depends closely on the content of the data to be encoded, together with the amount of motion, dynamic scenes and the mode of prediction used. Therefore, the correspondence \((b_{\mathrm {l}},~b_{\mathrm {f}}) \leftrightarrow (Q_{\mathrm {l}}~,Q_{\mathrm {f}})\) is hard to characterize. In what follow, we employ the JSVM FixedQPEncoder utility [27] to find the optimal Q value. This tool applies an iterative logarithmic search taking the target bit-rate as a stopping criteria with respect to a pre-defined mismatch range. The algorithm progressively adjusts the value of the quantization parameter and picks up the optimal Q value for both base and enhancement layers. It might be argued that the FixedQPEncoder tool has a high computational complexity, yet it provides a practical means of testing bit allocation and rate control mechanisms.

The original video sequence is split into even and odd frames providing two separate and independent streams at half the original frame rate [26] as illustrated in Fig. 6. Then, we opted for the SVC extension of the H.264/AVC standard [8] to encode each of the two resulting streams into a base layer and one enhancement layer using the quantization parameters produced by the FixedQPEncoder tool, namely, \(Q^{\mathrm {even}}_{\mathrm {l}}\) (or \(Q^{\mathrm {odd}}_{\mathrm {l}}\)) and \(Q^{\mathrm {even}}_{\mathrm {f}}\) (or \(Q^{\mathrm {odd}}_{\mathrm {f}}\)) respectively. The current work can be easily extended to support other types of coding scalability. The inter-layer prediction option was turned on to improve the coding efficiency and reduce the bit-rate of the enhancement layer. The use of such frameworks allows generating multiple levels of quality and provides graceful quality degradation under lossy network conditions.

Initially, the service provider entity pushes the original media file to multiple cognitive base stations. Afterwards, each CBS proactively encodes the media into two descriptions \(D_{1}\) and \(D_{2}\) depending on the traffic pattern in the associated cell, as depicted earlier (Fig. 6). Next, both descriptions are conveyed through the secondary link. At the end-user side, the corresponding MDSC decoder is used to reproduce the even and odd video sequences which are merged together to generate a full resolution of the decoded video sequence. In the case where just one description is received (\(D_{1}\) or \(D_{2}\)), a low quality stream can still be restored by interpolating the missing frames (simple or more sophisticated interpolating techniques can be used) and so a default video quality \(b_{\mathrm {l}}\) can be guaranteed, whereas the presence of TVWS channels allows carrying both descriptions \(D_{1}+D_{2}\) and display a full and improved video quality \(b_{\mathrm {f}}\).

In this way, all users may receive basic video experience over legacy frequencies. Nevertheless, the system can increase its capacity and acquire more spectrum through using TVWS to meet rising traffic demands.

Table 1 Summary of simulation parameters

5 Simulation results

For these experiments and unless otherwise stated, we assume a downlink scenario wherein a cognitive base station conveys a video stream to a mobile station in the range of a digital TV broadcasting with an interference temperature constraint of \(I=1~\mathrm {dB}\) and a transmission deadline of \(T_{\mathrm {f}} = 1.5~\mathrm {s}\). Simulations are performed under harsh and highly shadowed environmental conditions with a path loss component of \(\gamma =4\).

JSVM software [27] served as the H.264/SVC encoder/decoder. The JSVM version used is 9.19.15 mainly because it is one of the most stable among the later versions. To optimally achieve the target bit-rates, the FixedQPEncoder utility provided by JSVM is used to find the proper quantization parameters. Video quality assessment metrics are computed using the freely available Video Quality Measurement Tool (VQMT) [28]. Video processing operations and numerical simulations of the proposed model was performed using Matlab R2015a. All the tests were conducted on a machine with Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz (4CPUs), 8 GB RAM and 64-bit Windows 7 Professional Operating system.

The experimental evaluation of the proposed transmission framework was conducted using several video sequences, covering a wide range of content types in terms of motion, colors, texture, contrast and spatial details to identify a reasonable correlation between encoder settings, bit-rate and resulting quality and obtain valid findings independent as much as possible of the video type. Specifically, the testing video sequences are the conventional BUS, FOREMAN, MOBILE, CREW, HARBOUR and FOOTBALL in the YUV format [29]. Sequences exhibit different motion levels ranging from low to high motion. This set of video samples has been selected to be representative of typical video feeds for the target cellular use case.

For sake of simplicity and without loss of generality, the squared magnitude of the Rayleigh coefficients is normalized to unity: \(\mathrm {E}[|h_{\mathrm {TT}}|^2]=\mathrm {E}[|h_{\mathrm {PP}}|^2] =\mathrm {E}[|h_{\mathrm {SS}}|^2]=\mathrm {E}[|h_{\mathrm {SP}}|^2] =\mathrm {E}[|h_{\mathrm {PS}}|^2]=1\). The link distances between different nodes in the network, given by the direct link \(d_{\mathrm {SS}}\) and the interfering links \(d_{\mathrm {SP}}\) and \(d_{\mathrm {PS}}\), are around 500, 200 and 5000 m average, respectively. The number of TV channels left unused in this location is about 5. The other parameters used in this article are presented in Tables 1 and 2 with their corresponding values. Simulation results for the visual video quality are given in terms of PSNR and MS-SSIM.

Table 2 SVC encoding parameters
Fig. 7
figure 7

Overall secondary capacity versus arrival rate for cellular networks with and without spectrum sharing

Fig. 8
figure 8

Total bit-rate as a function of arrival rate: theory versus simulation results

Figure 7 examines the impact of the arrival rate \(\lambda \) on the capacity of the secondary link in the case of the proposed dual interweave and underlay strategy compared to that achieved by the interweave spectrum access. The capacity of the legacy cellular network, \((1 - P^{\text {LTE}}_{\text {B}}) C_{\text {LTE}}\), is also provided as a reference measure. The arrival rate was varied between 0 and \(10~s^{-1}\). It is clearly observed that the proposed dynamic spectrum sharing scheme outperforms the interweave mode and also the conventional architecture. As a matter of fact, secondary devices exploit the extra degree of freedom offered by the underlaid secondary signal together with a larger number of available radio resources to acquire a sufficiently high bandwidth up to almost threefold capacity gain. In addition, the interweave scenario yields better efficacy as opposed to to the legacy scheme, albeit being inferior to the proposed scheme. The later observation recognizes the potential of the spectrum sharing concept, whatever the chosen mode, to evolve cellular networks towards supporting higher traffic volumes without coexistence issues. We can also state that the graph presents a peak capacity at \(\lambda = 2.4\) meaning that the proposed scheme allows cellular networks to keep up with increasing user data rates during peak hours, which remains a critical point for the actual mobile networks.

Fig. 9
figure 9

Legacy carriers bit-rate for different \(\lambda \) values: theory versus simulation results

Figure 8 illustrates the total bit-rate \( b_{\mathrm {f}}\) related to the extended cellular operational mode as a function of the arrival rate \(\lambda \), examined from a theoretical and simulation-based view of point. The theoretical curve is in agreement with simulation results, and there is a quasi-perfect match between the bit-rate retrieved using the FixedQPEncoder tool and the theoretical bit-rate which highlights the accuracy of the proposed adaptive bit-rate approach. The proposed MDSC is capable of handling the increased bit-rate offered by the additional radio resources. Figure 8 revealed also another observation that the activation of the inter-layer prediction option has helped the proposed framework to rapidly decrease the bit-rate of the enhancement layer so that to be able to closely approximate the target bit-rates.

Figure 9 displays the influence of the arrival rate on the bit-rate of the conventional cellular networks in the absence of spectrum sharing for both theoretical and simulated performance assessment. Theoretical and simulation-based curves match closely for smaller and higher \(\lambda \) values. A small mismatching is noticed for FOOTBALL and MOBILE sequences for a larger number of arrivals exceeding 8. This can be explained by the complex still background and foreground motion of the MOBILE sequence and the high degree of motion characterizing the Football stream, resulting in more source coding bits to encode the I and P frames. However, the bit-rate mismatch remains negligible (about 5%). The proposed MDSC encoder is capable of correctly conveying the base layer description even for a larger number of connected users.

Fig. 10
figure 10

Quantization parameter for even and odd sub-streams as a function of \(\lambda \): base versus enhancement layer

Fig. 11
figure 11

PSNR of the received video as a function of arrival rate: legacy versus TVWS enhanced mode

Fig. 12
figure 12

MS-SSIM of the received video as a function of arrival rate: legacy versus TVWS enhanced mode

Fig. 13
figure 13

Screenshots of tested video sequences for the legacy cellular scenario: \(\lambda =2.4~s^{-1}\)

The quantization parameter Q values of the even and odd sub-sequences are plotted in Fig. 10 against the arrivals rate \(\lambda \). As can be seen, the quantization parameter of the even sub-stream perfectly matches its odd sub-stream counterpart. This can be justified by the fact that even and odd streams have quasi identical characteristics and high amount of redundant information with regard to motion and dynamic scenes. On the other hand, the plots related to the extended cellular mode each has an optimal point at \(\lambda = 2.4\) where the quantization parameter is minimized, this min Q value corresponds to the peak capacity observed in Fig. 7.

Fig. 14
figure 14

Screenshots of tested video sequences for the legacy cellular scenario: \(\lambda =5~s^{-1}\)

Fig. 15
figure 15

Screenshots of tested video sequences for the cognitive cellular scenario: \(\lambda =2.4~s^{-1}\)

Fig. 16
figure 16

Screenshots of tested video sequences for the cognitive cellular scenario: \(\lambda =5~s^{-1}\)

Figure 11 shows the impact of the arrival rate \(\lambda \) on the PSNR of the received videos for both legacy and TVWS enhanced cellular networks. The resulting graph related to cognitive cellular networks is concave and has an optimal point, where the achieved PSNR is maximum for \(\lambda =2.4~s^{-1}\). Further, Fig. 11 outlines the superior performance of our proposed scheme when compared to the conventional case. Expanding the spectrum pool by adding the idle TV carriers results in better video quality at the client side as a result of the good penetration of TV wavelengths. Likewise, Fig. 12 displays the MS-SSIM of the received videos as a function of \(\lambda \) for the proposed transmission framework together with the conventional cellular approach. We point out the same observations as in Fig. 11 regarding the optimality of the plotted graphs in terms of the achieved MS-SSIM. Moreover, it is worth noting that increasing the arrival rate beyond the optimal point has led to a smooth deterioration of the perceived video quality for the TVWS-enhanced case, as opposed to the legacy case where the quality degradation is more sharp. This is in line with the common observation that the current cellular systems experience obvious limitations in user experience as the number of customers active at one time increases, which is the ultimate result of the static spectrum management policy. Such results justify the interest of various telecommunication stakeholders in the transition from 4G to beyond-4G generations.

For the visual quality assessment, we display the 30th frame of the transmitted video sequences. Figures 13 and 14 illustrate the reconstructed videos in the legacy cellular case. Figures 15 and 16 are the results for the proposed spectrum sharing-based scenario. Depending on the number of successfully received descriptions, good overall video quality is obtained despite the presence of primary traffic interference and high fading effects for \(\lambda =2.4^{-1}\). When the network becomes saturated (\(\lambda \ge 2.4\)), the TVWS enhanced mode continues to show good visual quality whilst the legacy mode retrieves a significantly degraded video (i.e. \(\lambda =5\)).

Fig. 17
figure 17

Comparison of the proposed framework against the scheme in [22]: \(\lambda =3~s^{-1}\)

Figure 17 compares the performance of the proposed DSS-based framework with the state of the art research proposed in [22] in terms of the achieved PSNR and MS-SSIM respectively, plotted against the signal-to-noise ratio (SNR) at the secondary destination for a fixed \(\lambda =3\). The same parameter settings have been considered for a fairer comparison. The paper [22] is briefly overviewed in Sect. 2, it considers a different MDC scheme relying on the separation of even and odd frames under a simplified underlay CR channel model. For SNR values less than 0dB, we noticed that the scheme [22] performs slightly better than our proposed method. The PSNR (MS-SSIM, respectively) gain is approximately 1.5% (0.07%, respectively). As intuitively expected, the proposed solution doesn’t leverage the benefits of the interweave degree of freedom for lower SNR values since a low secondary transmit power is radiated. Nevertheless, this SNR range is not advantageous for the considered cellular use case because it limits the cognitive transmission to short-range communication systems. On the other hand, it is clearly observed that the proposed scheme exhibits higher quality levels and largely outperforms [22], for higher SNR values (SNR \(\ge \) 6 dB). The resulting quality gain is more predominant for higher SNR values as a result of the mixed underlay-interweave strategy allowing for higher secondary power levels as long as the broadcast channel is free. For this SNR range, the scheme [22] tends to converge to a steady state with constant values of the PSNR and MS-SSIM metrics. This is due to the fact that the transmit power in the underlay mode reaches the interference temperature limit and thus must remain below a given threshold (see Eqs. (3) and (4)) to sustain the throughput requirements of the incumbent users. For the SNR interval \(\left[ 0\,\mathrm {dB},6\,\mathrm {dB}\right] \), both schemes yield roughly similar efficacy. In accordance with the considered transmission scenario, it can be concluded that assuming a practical and suitable range for the SNR our proposed solution surpasses the scheme [22] and the obtained PSNR and MS-SSIM gains reach 5.4% and 2.6% respectively.

6 Conclusion

In this paper, technical hurdles related to layered video streaming using the scalable extension of H.264 codec standard applied to separated even and odd sub-streams have been studied in the context of cognitive cellular networks. When the spectrum is heavily accessed by users, the legacy cellular bands are not enough to accommodate the traffic volume and thus the system under consideration allows devices to pick up some free TV channels to help supporting the required data rate and improving the end-user perceived quality. Simulation results show that the achieved capacity is large enough to deliver all the video layers over the CCN with a threefold capacity gain, while the legacy frequencies are still able to deliver a basic video quality in non-saturated situations to be backward compatible with the previous generations. The obtained results also underline a MS-SSIM improvement of up to 24% for the tested video sequences while exploiting TVWS. These observations suggest that the fifth generation represents a big opportunity to boost and leverage the widespread deployment of dynamic spectrum sharing-based infrastructures for large scale multimedia communications. Finally, it is worth noting that the proposed streaming scheme relies on UDP that offers reduced transmission delay but UDP does not provide guaranteed quality of service and it can be blocked by firewalls. However, due to the fact that transmission control protocol (TCP) remains the most widely used for various activities on the web and error-prone environments, it is envisioned that the proposed scheme will evolve to be TCP-enabled.