1 Introduction

Advances in the widespread wireless services and technologies, have led to a significant increase in demand of the available spectrum. The Federal Communications Commission (FCC) report indicates that, most of the allocated spectrum is significantly underutilized in temporal and spatial due to the inefficient static frequency allocation policy [1]. A cognitive radio network is an emerging technique to mitigate the spectrum scarcity, by allowing the Secondary Users (SUs) access to the underutilized licensed channels of PUs temporarily [2, 3].

The fixed CA strategy can be mapped to graph coloring algorithm, which is known as NP-complete problem [4]. Dynamic CA in CNs is also NP-complete, since this is even harder than the fixed CA scheme. As the size of the problem grows, NP-complete problems cannot be solved in polynomial time. To address this issue, heuristic or meta-heuristic approaches are often used to speed up the process in the event that exhaustive searches are impractical.

Learning and intelligence capabilities are two significant characteristics of the CN. Taking advantage of these features to predict the spectrum condition and the holes positions, can dramatically improve the performance of distributed CNs [5]. LA is one of the meta-heuristic methods that has been recently utilized to solve different NP problems efficiently. Because of close relation between the CA problem requirements in CN and inherent characteristics of learning automata (LA), using LA is a reasonable idea to cope with this problem.

The main goal of dynamic CA in CR environments is to increase the quality of SU’s transmission, while protecting PUs from interference. PUs have higher priority than SUs. When a PU appears in the channel, SUs have to vacate this channel and resume their unfinished transmissions on the new selected channels. Therefore, PUs will be protected from SUs interferences and their communication quality will be guaranteed. On the other hand, once the quality of communication is unsatisfied, channel switching can be a helpful task for increasing the SU’s QoS.

Channel switching is an expensive task due to the communication time overhead. In addition, in some cases like a real-time data transmission, the interruptions in communication caused by the frequent channel switching, are unacceptable.

Therefore, In this paper, two fully distributed learning based, QoS aware CA methods are proposed with the following main contributions:

  • The online feature of the learning automata is used for adapting to the dynamic characteristic of CNs. This, helps to predict the future of spectrum conditions. In addition, here, parameters of the LA are traffic aware and tuned dynamically, in order to increase its adaptability.

  • In addition, appropriate using of spectrum holes without interfering with PUs and a tradeoff between several QoS requirements of SUs (i.e., throughput, delay, data delivery ratio, data lost) will be made.

  • The channel switching cost will be reduced to a desirable level, while keeping the communication quality in a good condition.

  • Interruptions caused by several channels switching will be decreased and hence, a communication with an appropriate level of disruption is setting up.

Figure 1 depicts a summary of the contributions.

Fig. 1
figure 1

Summary of the paper contributions

The rest of this paper is organized as follows: In Sect. 2, the related works are discussed in details. The theory of LA will be proposed in Sect. 3. In Sect. 4, system assumptions and models are introduced. Proposed algorithms and their flowcharts are described in Sect. 5. In Sect. 6 a comprehensive analysis of simulation results will be provided. Finally, in Sect. 7 article is concluded and some interesting research issues for future works are presented.

2 Related Works

In the context of the CA problem in the CN, many researches have been carried out recently, that can be generally classified as Fig. 2.

Fig. 2
figure 2

Classification of CA approaches

Several graph-theory related solutions such as network conflict graph [6, 7], Graph coloring [8,9,10], bipartite graph [11, 12], and factor graph [13] have been proposed to cope with CA problem in CN. However, these approaches cannot incorporate all of CN parameters such as QoS requirements and Adjacent Channel Interference (ACI).

In the literature, many works have used game theory to deal with the CA problems in CN [14,15,16,17]. This powerful mathematical tool can represent the cooperation and competition between SUs quite well. However, is it difficult to structure the game in such a way that always equilibrium is guaranteed.

Another commonly used mathematical technique for solving channel allocation problem in the CN is linear programming [18, 19]. The Advantage of using this approach is that the existing LP techniques can be utilized. But, the channel allocation can only be formulated as a Mixed Integer Non-Linear Programming (MINLP). MINLP is NP-hard and hence is not scalable. To address this, it should be transformed to Binary Linear Program (BLP). This transformation cannot be guaranteed because, it requires several assumptions for converting continuous variables to binary, which may not always be valid [20].

As noted earlier in Sect. 1 CA problem in CN is an NP-complete and hence often heuristic and meta-heuristic approaches should be used to speed-up the process of finding a sub-optimal solution in a scalable way.

Heuristic approaches do not have a specific algorithmic solution. Therefore, many simple algorithms have proposed in this category to tackle the problem of CA in CN [21,22,23]. Heuristic algorithms are simple and can be easily implemented. In theory, these methods are problem-independent and show less sensitivity to changes in the data quality and problem characteristics. However, most of the proposed heuristic solutions are problem-specific and therefore, cannot be used for accomplishing other problems. In addition, they may trap into local optima that may be far from the global optimal solution. Finally, there is no methodology for studying about the convergence of these methods.

Meta-heuristic based methods can be categorized in three sub-classes. The first one is fuzzy logic [24, 25], that can achieve the solution quickly based on the predefined rules. In addition, learning techniques can help for improving the quality of this solution. However, a fuzzy system is not scalable and needs too many rules for covering all the network parameters. Finally, because of the dynamic nature of CNs, definition of accurate rules is so hard. The second subclass of the reported meta-heuristic based methods is evolutionary algorithms, which include genetic algorithm [26,27,28], harmony search [29] and swarm intelligence [30, 31]. The arbitrary goals and constraints can be managed by these approaches. In addition, inappropriate solutions will be ignored simply. However, the process of finding a good solution is very slow and can lead to find a local optimal solution rather than the global one. The last subclass is reinforcement learning-based (RL based) algorithms [16, 32,33,34,35]. RL helps agents be compatible with the dynamic and uncertain environment. The proposed method in this paper uses the LA which can be placed in this category. In recent years, some ideas have been proposed to cope with the CA problem by using LA [36,37,38,39,40] as well, that will be discussed as follows:

In [36] the authors proposed a stochastic channel selection algorithm based on LA with the aim of reducing the channel switching times. The action probability vector changes according to the environmental feedback in a way that, a channel with the highest probability of successful transmission will have a higher chance for selection in the future epochs. This algorithm, tries to minimize the collision probability to avoid the costly channel switching. However, since channel quality and throughput are ignored in this algorithm, the QoS satisfaction cannot be guaranteed.

The proposed algorithm in [36] has been extended in [37]. Here, the collision probability and channel quality are considered simultaneously with the aim of maximizing the throughput. Both [36, 37] assume a network with stationary condition for the PU traffic distribution. In other words, the probability that a PU uses the communication channel does not change over the time. However, in the practical networks, SUs should communicate in a non-stationary environment.

In [38] a LA based CA algorithm has been proposed for adapting to the dynamic environment of CN. In this work, with tracking the notable changes in the reward estimator vector, altering the environment is recognized. After detecting this change, the algorithm is reinitiated, which is a very time-consuming task. In addition, this algorithm the same as [36, 37] only is developed for a single SU scenario and the effects of other SUs are not considered in the network behavior. Finally, in [38] like [36] QoS requirements are not guaranteed, because the channel quality and throughput are neglected.

In [39] another LA based spectrum allocation algorithm was proposed with the aim of reducing the delay, packet loss, and communication cost because of the frequently channel switching. In this work, a central broker collects the information about the states of the spectrum and demands of users into a database. Central broker controls the spectrum allocation through this information. This algorithm, unlike [36,37,38], regards the competition between SUs in the network. In addition, it tries to guarantee some of the QoS requirements (e.g. delay, fairness). For improving the fairness, one user can only use one channel and for avoiding conflict one channel can only be used by one user. Both of these, reduce the channel utilization. In addition, the overhead of obtaining and updating the network information are not negligible. Finally, the broker is a single point of failure and hence the network reliability cannot be guaranteed.

In [40] a multi response LA based algorithm for dynamic spectrum access in CN was introduced. Each SU is equipped with a LA. The LA has several actions equal to the number of primary channels. In addition, an admission control mechanism was proposed in order to restrict the number of competing SUs and reducing the rate of collisions between them. In this scheme, each SU is admitted with probability ψ (Action mode) and blocked with probability 1 − ψ (No Action mode). When the rate of the collision increases, ψ should be decreased in order to decrease the number of SUs that are competing for each channel and vice versa. To promise fairness, the ψ parameter of each SU that is blocked in the current time, will be increased. This algorithm improves some of QoS requirements (e.g. total throughput, the number of switching, fairness). Moreover, in this work the competition among SUs is considered unlike the most of learning based algorithms that have been introduced later. Disadvantages of this algorithm will be discussed in Sect. 6.

In the most of the previous works [33, 36,37,38,39,40], an SU switches the communication channel only when a PU appears in this channel. Their authors argue that, channel switching is a very costly task and should be avoided as much as possible. In some other works [32, 35], cognitive users not only switch the channel when a PU appears in it, but also vacate the communication channel when their quality becomes unacceptable. This is because, in these works satisfying of some QoS requirement (e.g. blocking and dropping probability) is more important than the overhead of switching. However, they also try to reduce the switching costs as much as possible.

All the above methods only decreased the overhead of channel tuning by reducing the number of switching by selecting the suitable channels for communication. However, even afterwards this overhead may be significant. Therefore, new methods are needed to overcome this problem.

3 Learning Automata

A learning automaton is an adaptive decision-making entity that is one of the main fields of artificial intelligence. The automaton attempts to find a solution for the problem without any information about suitable action [41]. LA improves its operation through learning how to choose the optimal action from a finite set of available actions by interacting repeatedly with a stochastic environment. The action is chosen randomly based on a probability distribution. At each instant, the given action serves as an input to the random environment. The environment, responds the taken action in order to the reinforcement signal. Then, the internal information of LA is updated based on the given feedback from the environment [42, 43]. Afterwards, LA adjusts its action repeatedly until a termination condition is satisfied. Figure 3 shows the relationship between a learning automaton and its random environment.

Fig. 3
figure 3

Learning automata diagram

Every environment is represented by: E = {α, β, c}, where α = {α1, α2, …, αr} is a set of inputs, β = {β1, β2, …, βr} is a set of outputs, and c = {c1, c2, …, cr} is a set of penalty probabilities, that ci denotes the penalty probability of each taken action ai. The environments depending on the nature of the reinforcement signal β can be categorized into three types: P-model, Q-model and S-model. If the β set is binary, the model is known as a p-model. In this model, β1 = 1, β2 = 0 represent the penalty and reward, respectively. Similarly, a model is called the Q-model if the β set contains a finite number of values in the interval [0, 1]. On the other hand, in the S-model, β set has an infinite number of members in the interval [0, 1].

LA can be classified into a fixed structure (FSLA) and variable structure (VSLA). In the FSLA, the transition and the output functions are time invariant. A learning automaton with variable structure is represented by {α, β, p, T}. In the definition of a VSLA, the LA is completely defined by a set of actions as α = {α1, α2, …, αr}, (one of which is the output of the automata), a set of inputs as β = {β1, β2, …, βr} (which is usually the response of the environment), action probability vector as p = {p1, p2, …, pr} and finally learning algorithm as p(n + 1) = T[α(n), β(n), p(n)]. The learning automata randomly chooses one action according to its probability vector (pi). The LA chooses action αi and applies it to the environment. Thereafter, LA receives a reinforcement signal from the environment and then updates its action probability vector. If the response from the environment is desirable, automata updates it’s action probability vector as follows:

$$\begin{array}{*{20}l} {{\text{p}}_{\text{i}} \left( {{\text{n}} + 1} \right) = {\text{p}}_{\text{i}} \left( {\text{n}} \right) + {\text{a}}\left( {1 - {\text{p}}_{\text{i}} \left( {\text{n}} \right)} \right) } \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} } \hfill \\ {{\text{p}}_{\text{j}} \left( {{\text{n}} + 1} \right) = {\text{p}}_{\text{j}} \left( {\text{n}} \right) - {\text{ap}}_{\text{j}} \left( {\text{n}} \right) } \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} , \forall {\text{j}}, {\text{j}} \ne {\text{i}} } \hfill \\ \end{array}$$
(1)

In contrary, if the environmental feedback is unfavorable, the action probability vector will be updated as follows:

$$\begin{array}{*{20}l} {{\text{p}}_{\text{i}} \left( {{\text{n}} + 1} \right) = \left( {1 - {\text{b}}} \right) p_{i} \left( {\text{n}} \right)} \hfill & {{\mathbf{a}}\left( {\mathbf{n}} \right) = {\mathbf{a}}_{{\mathbf{i}}} } \hfill \\ {{\text{p}}_{\text{j}} \left( {{\text{n}} + 1} \right) = \frac{\text{b}}{{{\text{r}} - 1}} + \left( {1 - {\text{b}}} \right){\text{p}}_{\text{j}} \left( {\text{n}} \right) } \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} , \forall {\text{j}}, {\text{j}} \ne {\text{i}} } \hfill \\ \end{array}$$
(2)

In formula (1) and (2), a and b denote reward and penalty parameters respectively. If a and b are equal, the algorithm is called LR–P. Whereas, for b ≪ a the algorithm is LRεP. Finally, if b = 0 then the algorithm is said to be LR–I.

In what follows, some principles will be expressed to clarify the reasonable use of LA to solve the problem of CA in CN; The first reason is that, CA problem is an NP problem [4] which can’t be solved in polynomial time order. On the other hand, learning automaton is a Meta-heuristic algorithm that can overcome this problem. In addition, for CA in CN, the behavior of PUs and other SUs in the network should be accurately followed. Since, the nature of CN is unpredictable, an appropriate manner is needed to adapt to this dynamicity. LA is a dynamic and adaptive entity that tries to find an online solution, which can be helpful for optimization of CA problem in the non-stationary environment. Moreover, because of the dynamic nature of CNs, it is so hard to gather accurate and updated information about the network. The main characteristic of an LA is that, it learns the optimal action through interactions with the environment, without any prior knowledge. The environment can be treated as a “black box”, and LA only needs the feedback signal.

Table 1 summarizes the aforementioned principles which can map the requirements of the CA problem to capabilities of LA.

Table 1 Requirements of the channel assignment problem and capabilities of the learning automata

4 System Model

In this Section, the system model and the basic policies, assumptions and limitations of the proposed method are presented.

A fully distributed and non-cooperative cognitive radio network is considered, which is consisting of M SUs (that are located in the fixed and random manner), N PUs and N + 1 channels (N data channels and 1 control channel). Every SU is equipped with R + 1 radio interfaces (R radios for data and 1 radio for control communication). Each of the R data radio can only transmit, receive or sense in data channels at one time. Control radio is devoted for operating over the dedicated control channel.

Traffic model of PU in the channel (presence or absence of the PU signal) follows a continuous time Markov chain with two states: idle (OFF) and busy (ON). This model has been used widely in the literature (e.g., [40]).

SUs use an overlay manner [5] for utilizing the PU’s channel. In this method, an SU only transmits opportunistically over the licensed spectrum once the PU is in idle period. As soon as a PU is detected on the channel (transits to ON state), SUs are forced to vacate the occupied channel. In other words, PUs have the preemptive priority to access the channels.

For quantifying the quality of a link, signal to interference plus noise ratio (SINR) is applied. Equation (3) calculates the SINR of a packet that is sent from a sender SUi to the receiver SUj when SUk and SUi are using the same channel [44].

$$SINR = \frac{{P_{i} G_{ij} }}{{\mathop \sum \nolimits_{k \ne i} P_{k} G_{kj} + \sigma^{2} }}$$
(3)

where Pi is the transmission power by transmitter i and σ2 is the noise. G ij is the channel gain and can be mathematically expressed as Eq. (4) [44]. Here, d0 the far-field crossover is distance and η is the isotropic path loss exponent.

$$G_{ij} = \left[ {\frac{{d_{0} }}{{\sqrt {\left( {x_{j} - x_{i} } \right)^{2} + \left( {y_{j} - y_{i} } \right)^{2} } }}} \right]^{\eta } \quad \forall i,j\;\; and\;\; i \ne j$$
(4)

According to Eq. (3) noise of the communication channel is inversely proportional to the SINR of received packets. On the other hand, it is clear that, noise is an inevitable problem in wireless network’s channels. Hence, to avoid the wrong decision influenced by sudden noises, a new metric for decision making about the quality of communication channels is considered. This metric is called as SINR-history (\({\text{SINR}}^{*}\)).\({\text{SINR}}^{*} ({\text{n}} + 1)\) is calculated as follows:

$$SINR^{*} \left( {n + 1} \right) = SINR\left( {n + 1} \right) + \left( {1 - \alpha } \right)SINR^{*} \left( n \right)\quad 0 \le \alpha \le 1$$
(5)

where 0 ≤ α ≤ 1 is the historical rate. In Eq. (5) the greater amount of α gives more weight to the last packet’s SINR. On the other hand, the smaller value of α, grants more importance to the SINRs of the past epochs.

SINRThr is the minimum level of SINR for acceptable communication signal quality. In other words, the channel quality in transmission from SUi to SUj is acceptable if the SINR of the received packet at node j is greater than SINRThr. Packets with unfavorable quality should be discarded. In addition, in this paper, another level of SINR is introduced for decision making when the channel quality is degraded. This level is called SINRCritical and can be expressed as follows:

$$SINR_{Critical} = SINR_{Thr} + \Delta$$
(6)

where ∆ > 0 is a small value. This level, helps us predict about quality of channels in the near future. Critical range can be expressed as follows:

$$SINR_{Thr} < SINR^{*} \le SINR_{Critical}$$
(7)

In other words, if after receiving a packet, \({\text{SINR}}^{*}\) exists in the critical range, it can be predicted that this channel quality is not desirable for continuing this communication. In addition, this degradation is not influenced by the sudden noise.

4.1 Channel Switching Model

Channel switching is the process of changing the operational channels by an SU. In this paper, the communication channel will be changed in two states. As mentioned earlier, the overlay model is applied for spectrum sharing among SUs and PUs. Hence, at first, when the PU appears in the channel, SUs have to vacate the occupied channel. Second, continuing communication will be prevented in the low-quality channel by channel switching.

Channel switching process requires sensing time, handshaking time and exchanging time [45]. Tswitching can be calculated as follows:

$$T_{switching} = T_{sen\sin g} + T_{handshaking} + T_{exchanging}$$
(8)

where Tsensing denotes a time that a SU is needed to select a new suitable channel. Thandshaking is the required time for negotiation about the selected channel between sender and receiver. Texchange is the duration time, which an SU needs for changing frequency band from current channel to the new selected channel.

Channel switching may degrade the SU’s performance by incurring longer delay and disrupts temporary communication [45]. In the first proposed algorithm, this delay and interruption will be imposed to the network for all channels switching. However, the second algorithm tries to prevent from delay and interruptions of channels switching through utilization of Backup Radio (BR).

In what follows, the properties and operation mechanism of BR are explained.

4.2 Backup Radio Model and Mechanism

In a multi radio network, all of radios are not often active simultaneously in all the times. During the communication, if an SU predicts that channel quality will not be acceptable in the near future and the SU has some idle radios, the BR is selected randomly from these radios. Under the assumption of every radio will be active (sending, receiving or sensing) with independent probability μ, the probability that at least one BR is available can be expressed as follows:

$$P_{UnusedRadio} = 1 - \mu^{R - 1}$$
(9)

where R is the number of data radios. In other words, P UnusedRadio is the probability of finding a BR when the parallel operation is needed. For example, if an SU equips with five data radios and each of the radios is active in 75% of the time, then in 70% of the time BR can be found.

In the second proposed algorithm, when the critical condition occurs (according to (7)) communication in the current channel is not stopped. Instead, the algorithm tries to find a BR, activates it and then starts the process of selecting a new channel with sending operation (which is carried out by main radio) simultaneously. The BR activity includes selecting a new channel (by automata) and then sensing selected channel to ensure that PU is not active in it. The time of these activities is equal to T sensing in the channel switching process. After selecting a desired channel, handshaking process will be done with T handshaking . Controlled radio performs the handshaking process in parallel to the main radio. Through this, controlled data will be exchanged between sender and receiver. After handshaking, main radio stops sending operation and after a negligible delay (T exchanging on receiver radio), BR continues to data sending operation in a new suitable channel.

Figure 4 shows how the BR helps the main radio reduce the costs of channel switching.

Fig. 4
figure 4

BR mechanism

4.3 Learning Model

In proposed model, every SU is equipped with one LA. The action set of learning automata is represented by ch = {ch 1, ch2, …, chN} and Action probability vector by p = {p 1, p2, …, pN}, in which N is the number of channels. The probability of selecting channel i (i = 1..N) is represented by the value of the corresponding index in action probability vector. When an SU has to select a new suitable channel to resume its unfinished transmission, LA randomly chooses one channel according to its probability vector. When data transmission is finished through the selected channel, the learning automaton updates its action probability vector proportional to communication quality.

As mentioned in Sect. 3, LA uses two parameters a and b for updating action probability vector. In this paper, these parameters are calculated dynamically, with the aim of adapting to the dynamic nature of CNs. a and b can be expressed as follows:

$$0 < a = \frac{SINR}{{SINR_{max} }} < 1 \quad \forall SINR \ge SINR_{Thr}$$
(10)
$$0 < b = \frac{{SINR_{Thr} - SINR}}{{SINR_{max} }} < 1\quad \forall SINR < SINR_{Thr}$$
(11)

As shown in Eqs. (10) and (11) a and b dynamically change with the quality of communication. Therefore, Eqs. (1) can be updated as (12) in the case that the received signal is desirable.

$$\begin{array}{*{20}l} {p_{i} \left( {n + 1} \right) = p_{i} \left( n \right) + \frac{SINR}{{SINR_{max} }}\left( {1 - p_{i} \left( n \right)} \right)} \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} } \hfill \\ {p_{j} \left( {n + 1} \right) = p_{j} \left( n \right) - \frac{SINR}{{SINR_{max} }}p_{j} \left( n \right) } \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} ,\,\forall {\text{j}}, {\text{j}} \ne {\text{i}} } \hfill \\ \end{array}$$
(12)

Similarly, Eq. (2) is updated as (13) when feedback from the environment is undesirable.

$$\begin{array}{*{20}l} {p_{i} \left( {n + 1} \right) = \frac{SINR}{{SINR_{max} }}p_{i} \left( n \right)} \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} } \hfill \\ { p_{j} \left( {n + 1} \right) = \frac{SINR}{{\left( {r - 1} \right)SINR_{max} }} + \left( {1 - \frac{SINR}{{SINR_{max} }}} \right) } \hfill & {{\text{a}}\left( {\text{n}} \right) = {\text{a}}_{\text{i}} ,\;\forall {\text{j}},{\text{j}} \ne {\text{i}} } \hfill \\ \end{array}$$
(13)

Thus, channels receive a punish or a reward corresponding to their quality. Using this method we are able to accurately differentiate the channels from each other in terms of quality.

5 Proposed Algorithms

In this Section, two dynamic channel assignment algorithms based on LA are proposed in a fully distributed fashion. These algorithms will be run on the all individual transmitter–receiver pairs and no information will be exchanged between every two users that already don’t communicate. The first algorithm is called QASA (Qos aware learning Automat base Spectrum Allocation) and the second algorithm is named DQASA (Delay and Qos aware learning Automat base Spectrum Allocation).

5.1 Preparation Phase

In the preparation phase, two operations should be taken. First, the action probability vector is set with the same initial value for all actions i as follows:

$$p_{i} \left( t \right) = \frac{1}{N} \forall i; t = 0$$
(14)

where N is the number of channels. With this initialization, all channels have an equal chance to be selected at the first time. Second, the SINRk-max matrix should be calculated for SUk that is expressed as follows:

$$SINR_{{k{ - }max}} = \left[ {\begin{array}{*{20}c} {SINR_{max\,11} } & \cdots & {SINR_{max\,1n} } \\ \vdots & & \vdots \\ {SINR_{max\,m1} } & \cdots & {SINR_{max\,mn} } \\ \end{array} } \right]\quad 0 < k < m + 1$$
(15)

This matrix will be calculated and saved in the memory of each node k, in the preparation phase. The columns and rows of SINRk-max indicate channels and receiver nodes, respectively. Every element of this matrix shows the SINRmax ij(i = 1..m, j = 1..n). That is, the maximum SINR in transmission from node k to receiver node i (i = 1..m) through the channel j (j = 1..n). For calculating SINRk-max matrix, this scenario is considered: one SU is a sender and other secondary nodes are receivers. The sender k transmits (m − 1)n packets to (m − 1) receivers through all n channels one by one. Upon receiving a packet, SINR is calculated and saved in the given matrix. This process will be repeated for all SUs. Since at one time only one communication occurs, interference is minimized and calculated SINR gets its maximum value.

5.2 QASA: First Proposed Algorithm

In this Section, the operation and stages of the first proposed algorithm will be illustrated. The flowchart of QASA is shown in Fig. 6. The algorithm will be started after receiving the first packet of a new data flow. Then, SU tries to select a suitable channel by the LA. This operation will continue until the SU finds a free channel. Then, packets will be sent by SU through selected channel. Afterwards, in destination the quality of packet will be evaluated and accordingly, the following scenario will occurs:

In case of (SINR < SINRThr) and (SINR* ≥ SINRThr + ∆), the packet is discarded; selected channel is penalized; action probability vector is updated, and communication will continue through the channel.

  • In case of (SINR < SINRThr) and (SINR* < SINRThr + ∆), the packet is discarded; selected channel is penalized; action probability vector is updated; communication through the used channel is stopped, and SU retries to select a suitable channel.

  • In case of (SINR ≥ SINRThr), used channel is rewarded; action probability vector is updated, and finally communication will continue through the channel.

This algorithm will be run for any generated data flow in senders and finishes when the last packet of this flow is sent successfully. The corresponding algorithms and the flowchart of QASA is shown in Figs. 5 and 6 respectively.

Fig. 5
figure 5

Handle message algorithm of DQASA

Fig. 6
figure 6

Flowchart of the first proposed algorithm

5.3 DQASA: Second Proposed Algorithm

In this Section, the operation and stages of the second proposed algorithm will be explained. This algorithm is suggested with the aim of reducing the channel switching overhead of the first algorithm. The flowchart of DQASA is depicted in Fig. 8. This algorithm like QASA will be started after receiving the first packet of a new data flow. Afterwards a new suitable and free channel is selected by LA and packet is sent through it. In the receiver SINR and SINR* will be calculated and accordingly, the following scenario will occure:

  • In case of (SINR < SINRThr) and (SINR* ≥ SINRThr + ∆), the packet is discarded; selected channel is penalized; action probability vector is updated, and communication will continue through the channel.

  • In case of (SINR < SINRThr) and [(SINR* < SINRThr) or [(SINRThr ≤ SINR* < SINRThr + ∆) and (all radios are active)]], the packet is discarded; selected channel is penalized; action probability vector is updated; communication through the used channel is stopped, and SU retries to select a suitable channel.

  • In case of (SINR < SINRThr) and (SINRThr ≤ SINR* < SINRThr + ∆) and all radios are not active, BR will be chosen randomly among inactive radios; LA selects a new channel; BR senses the selected channel; control information is exchanged between control radios; sending operation through main radio is stopped and communication will be continued by BR.

  • In case of (SINR ≥ SINRThr), used channel is rewarded; action probability vector is updated, and finally communication will continue through the channel.

This algorithm like QASA will be run for any generated data flow in senders and finishes when the last packet of this flow is sent successfully. In Figs. 7 and 8 the corresponding algorithms and the flowchart of DQASA is depicted.

Fig. 7
figure 7

Handle message algorithm of DQASA

Fig. 8
figure 8

The flowchart of second proposed algorithm Evaluation of performance

6 Evaluation of Performance

Performance of the proposed method is evaluated through extensive simulations of the algorithm using OMNeT++ [46] which is a component-based, open-architecture, modular and discrete event network simulator. In addition, INET–MANET framework feature was considered for simulating multi radio multi channel networks.

Network nodes were deployed on a plane of 500 * 500 m2. The number of SUs is set to M = 50, that are stationary and distributed uniformly. In the proposed model, 11 channels were considered with one dedicated control channel. The number of PUs is equal to the number of data channels and was set to N = 10. Every SU was equipped with one control radio and the number of data radios varies between 1 ≤ R ≤ 10. Simulations run for 100 s.

Extensive simulations carried out under different scenarios and network settings. The results are compared for the following network parameters:

  • Channel switching rate (CSR): is the rate of switching caused by degradation of channel quality. This rate is calculated as the total average number of secondary channels switching (except those are due to the presence of PUs), divided by the simulation period.

  • Data delivery ratio (DDR): is the ratio of the average amount of data that all the senders send to that of all the receivers receive on the network during the simulation periode.

  • Throughput: is defined as the rate of data received successfully by the MAC layer of all the secondary receivers over the simulation period.

  • Data los (DL)t: is the difference between the average amount of data sent and average amount of data received.

  • Communication delay (CD): is the time between a packet’s creation and its delivery. One of the main resources of the CD is the queuing delay, which is the time a packet waits in the queue to be scheduled and is inversely proportional to the DDR. The lower amount of DDR is proportional to more dropped packets. By increasing the number of dropped packets, the queue length and as a result queuing delay will be increased.

DQASA and QASA are evaluated in terms of the aforementioned QoS parameters compared to other representative approaches namely MRLA [40], RND and QRSA (QoS aware Random Spectrum Allocation). In RND like MRLA channel are switched only when a PU appears in it. However, RND selects a new channel randomly. Also, QRSA operates like QASA but it selects a new channel randomly. It should be noted that, all of these algorithms are classified into two categories. In the methods of the first category (i.e., QRSA, QASA and DQASA), SU not only switches the channel in the presence of PU, but also changes it when the channel quality is degraded. In the second category (i.e., MRLA and RND), different methods tend to switch the channels after a PU appears.

In what follows, the results of conducted experiments are represented to investigate the performance of two proposed schemes. Three sets of simulations are considered in these experiments. In the first one, the results are represented during first 100 s in which every SU is equipped with 5 radios. The second one presents the results in time instance 100 s in which the number of radios vary from 1 to 10 per each SU. Finally, in the third set, the cumulative results are shown in time instance 100 s when each SU is equipped with 5 radios.

6.1 Experiment 1: Channel Switching Evaluation

This experiment was conducted to evaluate the channel switching rate as well as total number of channel switching in two proposed methods compared to the other methods. It should be noted that, in DQASA when there is a chance to find a BR the channel switching delay is negligible. From the results reported in Fig. 9 in all of compared algorithms, in the beginning of the simulation, the network traffic is light, the quality of the channels is desirable and therefore, the CHR is low. The CHR in different algorithms will be stable around 50 s. The results of Figs. 9 and 10 demonstrate that the rate and total number of channel switching in learning based algorithms are significantly less than other comparative algorithms especially after stability. The reason is that, learning based schemes get a higher selection chance to the channels with successful experiences of transmission in the past epochs. Hence, these algorithms gradually tend to select the channels with better quality and so SUs are not forced to change communication channels frequently. In addition, as seen in Fig. 10, although MRLA provides lower total channel switching than QASA, DQASA not only recovers this weakness (by using BR) but also outperforms the MRLA. This superiority thanks to better decision making metric in DQASA (SINR) in contrary to the decision making metric in MRLA (presence or absence of other SUs in channel). By increasing the number of radios per each SU, the transmission capacity, number of channel switching and the network traffic will be increased. With the enhancement of the radios, DQASA has a higher chance to find a BR and as a result, the number of channel switching that are due to the channel quality degradation will be decreased and gradually will be converged to 0. That is why the results of Fig. 11 shows that increasing the number of radios increases the number of channels switching in all algorithms, except in DQASA.

Fig. 9
figure 9

Channel switching rate versus time (R = 5)

Fig. 10
figure 10

Total number of channel switching in Seconds between (1–50) and between (51–100) (R = 5)

Fig. 11
figure 11

Total number of channels switching after stability versus number of radios per each SU

6.2 Experiment 2: Throughput Evaluation

The objective of this experiment is to assess the resultant throughput and sent data rate of two proposed algorithms compared to the other algorithms. From the Fig. 12, the gap between the chart associated with these two parameters in the methods of the second aforementioned category is larger than that of other methods. This large gap is due to the large number of dropped packets caused by the communication through bad quality channels in the methods of this category. In addition, DQASA shows higher throughput than QASA. This superiority thanks to higher sent data rate due to the lower switching delay in DQASA. From the results reported in Fig. 13 both proposed methods provide more attained throughput compared to the other approaches. This superiority is due to channel switching in channel degradation quality times that guarantees the quality of received packet (in comparison with RND and MRLA) and employing LA (in comparison with RND and QRSA). As shown in Fig. 14, by increasing the number of radios, DQASA shows the better attained throughput than other methods. The reason is that, by increasing the number of radios per each SU, increasing the BRs activity leads to a significant degradation in blocking and dropping probability in the DQASA. Therefore, the greater amount of data will be successfully received at the destination.

Fig. 12
figure 12

The rate of the sent and received (throughput) data (R = 5)

Fig. 13
figure 13

Throughput versus time (R = 5)

Fig. 14
figure 14

Throughput versus number of radios per each SU

6.3 Experiment 3: Data Delivery Ratio (DDR) Evaluation

In this experiment, the aim is to evaluate the attained DDR in proposed algorithms in comparison with other algorithms. As it can be seen in Fig. 15, in the beginning of the simulation, DDR in all compared methods are close to each other. But in the methods of the firs category DDR increases and converges to 1 gradually. The reason is that, In the methods of this category, when the quality of channel degrade, communication via this channel will be spotted and continued through a suitable channel and so the higher percentage of the sent data are received successfully at the destination. By increasing the number of radios the traffic in the methods of the first category, will be better distributed among the channels and so the quality of network communications will be enhanced. Therefore, DDR will be increased and converged to 1. But in the methods of the second category, by increasing the network traffic due to utilization of the more radios per each SU, the channels quality and as a result DDR will be decreased. That is why the results of Fig. 16 depict the ascending trend in the methods of the first category in contrary to the descending trend in other methods. In addition the superiority of two proposed methods in comparison with QRSA in Fig. 16b thanks to inherent characteristics of LA, that leads to selecting the channels with higher chance of successful communication even after enhancement of the traffic of the network.

Fig. 15
figure 15

Data delivery ratio versus Time (R = 5)

Fig. 16
figure 16

Data delivery ratio versus Number of radios in SU a all of methods, b the first category methods (is shown for more clarity)

6.4 Experiment 4: Data Lost (DL) Evaluation

This experiment is mainly conducted to evaluate the achieved DL in proposed methods compared to the other methods. It is clear from Fig. 17 that, the total DL in the first category is significantly lower than second category methods. This large difference is because, in the methods of the second category, when the packet is dropped due to the channel’s quality degradation SU tries to resend this packet, again through the current bad quality channel. As a result, this channel will become more congested, and therefore, data loss will be increased. This happens while, many high-quality channels may be available in the network. In contrary to the methods of the first category, lost packets will be resend via a new suitable channel and so, will be successfully delivered with higher probability. similarly, as seen in Fig. 18a, b by increasing the number of radios per each SU in the methods of the first category, packets will be better divided between channels. Hence, not only the amount of lost data will be degraded, but also, resent packets will be delivered with higher chance.

Fig. 17
figure 17

Total data lost in the 100 s (R = 5)

Fig. 18
figure 18

Data lost versus Number of radios per each SU a for all of methods b for the first category methods (is shown for more clarity)

6.5 Experiment 5: Delay Evaluation

In this experiment the objective is to study the achieved delay in proposed methods compared to the other methods. By comparing Figs. 9, 15 and 19 the following points can be found:

Fig. 19
figure 19

Average delay of any packet

  • Although MRLA and DQASA have almost the same channel switching curves, DQASA shows lower delay than MRLA. This superiority thanks to higher DDR in the DQASA that leads to lower queue length and delay.

  • Despite QASA shows higher channel switching rate than MRLA because of the higher DDR, QASA has the less average delay than MRLA.

As shown in Fig. 20, by increasing the number of radios and transmission capacity, lower data will be waited in the queue and hence, the queuing delay and as a result average delay of packets will be decreased in all methods. In addition, In DQASA with 6, 8 and 10 radios average delay will not perform any important change and is almost close to 0. This indicates that when SU is equipped with 6 radios (and more), the channel switching and queuing delays will be minimized (almost zero) and only slightly propagation delay will be occurred. By increasing the number of radios the chance of finding a BR in DQASA will be enhanced. Hence, channel switching overhead will decrease significantly because of parallelizing the operation of channel switching and data sending. That is why the distance between the average delay of DQASA and QASA will be increased by increasing the number of radios.

Fig. 20
figure 20

Average delay of any packet versus number of radios per each SU

7 Conclusion and Future Works

In this paper, the problem of channel allocation in dynamic cognitive networks was investigated, which is known as NP-complete. Most of the previous works only have tried to reduce the overhead of channel tuning by reducing the number of switching through selecting the suitable channels for communication. In this paper, two fully distributed learning automata based channel allocation algorithms were proposed. In the first proposed algorithm, it is tried to satisfy the QoS requirements by prevention and avoidance of communication via bad-quality channels. In the second algorithm, with the aim of reducing the cost of channel switching, the idle radios of SUs have been utilized to parallelize the channel switching and data transmission operations. Simulation results demonstrate that two proposed schemes outperform MRLA [40], RNA and QRSA in terms of QoS parameters. In addition, the second proposed method can significantly overcome the time overhead and interruptions caused by channel switching. Some of the open problems in this research can be listed as follows: It would be interesting to utilize the BR for reducing the overhead of channels switching that are due to the presence of PUs. It is also worthwhile to study about the influence of utilizing the BR on power consumption in the networks. And finally, It is beneficial to extend proposed work for multi hop networks with mobile users.