1 Introduction

Cognitive network is a network with a cognitive process that can perceive current network conditions, and then plan, decide, and act on those conditions. The network can learn from the adaptations and use them to make future decisions, while taking into account end-to-end goals [1]. Until now, cognitive network has been mostly studied from the spectrum and radio perspective. More accurately, this is referred to cognitive radio network, which allows the agile and efficient utilization of the radio spectrum by offering distributed terminals or radio cells the ability of radio sensing, self-adaptation, and dynamic spectrum sharing [2, 3].

Recently, cognitive networks are seen by many actors of the wireless industry as a core technical evolution towards exploitation of the full potential of next generation systems which are underway to revolutionize wireless communications just as the Internet revolution did in its domain. The cognitive radio is an essential methodology to deal with the current inflexible spectrum division and unbalanced spectrum utilization; meanwhile, it facilitates the interoperability or convergence of different wireless communication networks [4] which provides a fundamental guarantee for the future wireless cognitive network and self-organized network (SON) with the sensing and adaptation features [5]. By carefully sensing the lincensed user’s presence and adapting their own transmission parameters to guarantee a certain performance quality for them, these cognitive devices could dramatically improve spectral efficiency [6] and improve the network performance achieving the individual and end-to-end network goals [79]. Meanwhile, much interest has arisen in applying the ‘cognition’ concept from other fields, such as machine learning, evolution theories from the natural systems, mathematics and economics, that mimic the behavior of the communication networks of today.

Generally, there are two major problems in the wireless cognitive network: one is the coexistence of multi-heterogeneous networks which enables users in the cognitive networks to get diverse QoS experience and maintain seamless communication; the other is the dynamic spectrum access for the multi-radio environment which provides a flexible way to deal with the relatively underutilized situation. The former has been well studied in [10], herein, we target to tackle the second issue, and in what follows, we will explore the solution of dynamic spectrum management (DSM) between primary network (PN, i.e. licensed network) and secondary networks (SN, i.e. unlicensed network) both in theory and practice by a mathematical behavior modeling approach.

There are challenges for tackling the DSM in the wireless cognitive network: first, a cognitive radio transmitter must adapt to disturbance of the environment positively or passively, in particular the time-varying nature of channel characteristics, dynamic variant applications or services and etc.; second, different network entities (no matter in licensed or unlicensed networks)’ action or performance mutually is impacted by each other, i.e. the transmission strategy of a user impacts and is impacted by the competitors simultaneously; third, due to information decentralized and heterogeneous nature of the multiuser interaction in the wireless resource market, it becomes even more difficult to analyze the interaction of them.

To cope with these challenges, we are motivated to study the behavior property of individuals and sum up with a certain behavior model for dynamic spectrum management theoretically and instantiatedly. Due to the intelligent features, we attempt to mathematically capture the behavior in strategic situations, in which an individual’s success in making choices depends on the choices of others. Especially, in a heterogeneous cognitive network, the spectrum management among multiple wireless networks with different standard should deal with the cooperative and competitive relationship among them. Therefore, it is more natural to study the intelligent behaviors and interactions of network (cooperative, selfish, or malicious) for dynamic spectrum sharing from the perspective of game theory. We try to explain the behavior of users in the cognitive network by market behavior such as supply, demand, and price mechanism (e.g., general equilibrium theory), or to characterize the equilibrium outcomes of given games (e.g., auction theory). Meanwhile, applying game theory could effectively guarantee the fairness and rationality for the spectrum management among network operators.

Focusing on the spectrum competition problem in the heterogeneous networks, we propose a specific double auction based spectrum trading (DAST) sceme. In order to further improve the assignment stability and general performance of the cognitive system, we aim to discuss how to adapt, predict, learn, and to determine how secondary networks compete for the time-varying resources, as well as to explore how they select the associated transmission opportunity, given the dynamic environment.

1.1 Related work

Attentions for cognitive network have arisen from industry and standard organizations since 2008. European End-to-End Reconfigurability (E 2 R) Research and End-to-End Efficiency (E 3) target to optimize the use of the radio resources and spectrum, following cognitive radio and cognitive network paradigms [11]. E 3 has developed dynamic spectrum management functionality, exploiting cognition techniques, so as to optimally assign spectrum in the context of cognitive infrastructures.

Consequently, future wireless deployments will be conceived on a fully cognitive system basis. In particular, ongoing standardization on IMT-Advanced related radio and cognitive systems are targeted, with contributions enabling the convergence towards a future harmonized and interoperable wireless landscape. In the framework of a highly dynamic, heterogeneous wireless system, the objective is to propose spectrum and radio resource selection schemes which are efficient in terms of channel disturbance, subscriber ranking, QoS requirement, etc.

However, the radio perspective of cognitive network is dated back to Mitola’s initial definition of cognitive radio (CR) [12] in 1999. After that, the study of cognitive radio comes to a halt on the software defined radio literarily for years. Until 2005, Haykin [13] gives a fresh definition of CR and presents a novel view of intelligence, which brings fresh discussion to dynamic spectrum management. In the last four years, most of the work in this area of cognitive radio emphasized on the technical aspect of spectrum sensing [1416] and protocols of dynamic spectrum sharing [17, 18] presents recent developments and open research issues in spectrum management of CR networks and discusses challenges. Furthermore, the marketing theory brings a novel approach of spectrum sharing from aspect of economics [1922]. They refer the dynamic spectrum sharing as a spectrum trading process of buying and selling radio resource in the cognitive network environment, where game theory and pricing mechanism function well. In [23], the authors propose a spectrum selection game where the secondary users (i.e. SU, users in the SN) composed the strategy space to compete for the Spectrum OPportunities (SOP) by choosing the minimal experienced cost function. In addition, the authors of [24] formulate a non-cooperative game for the secondary users by obtaining the differentiated pricing function for different strategies. After comparing with the static game, they propose a dynamic game by adjusting the strategies based on the observations of their previous strategies to obtain Nash equilibrium as the solution to the spectrum allocation for secondary users. Also, there has been a considerable body of literature that details the spectrum sharing via auction mechanism for cognitive radio investigated in [2527].

Summarily, although some excellent research of DSM in the wireless cognitive network via pricing or auction mechanism has been done as introduced, there are obvious problems: first, no matter in the pricing function [28, 24] or cost function [23], the authors only consider the requested/allocated spectrum size or bandwidth and holding time without distinguishing different spectrum from the physical features which reduces the measurement granularity of the utilities; second, only an oligopoly market competition for such networks with one primary user (PU, i.e. users in the PN) and multiple secondary users is considered, and some are simply single-sided spectrum auction methods without considering that the PNs participated in the auctions are selfish, only maximizing the revenue unilaterally [27]. Besides, most of the game theory or pricing mechanisms are static or dynamic with the environment passively without forecasting the future behavior or reward. Specifically, properties in the secondary network are overlooked and the SNs can’t make any decision on spectrum selection, thus, it weakens in the cognitive abilities. Also, a complete process of how to present the biddings and how to implement the double auction has never been studied.

1.2 Contributions of the work

In this paper, we aim to present a behavior model to give illustration on how the cognitive function performs in the cognitive network. We present an explicit OODA circle based on the specific actions in the multi-radio spectrum trade.

For the cognitive spectrum competition, the contributions of the work are summarized as follows:

  1. 1.

    In the described cognitive network with multiple PNs and multiple SNs, we analyze a behavior model from the perspective of OODA cognitive circle perspective.

  2. 2.

    Considering the varying channel and trading history, we formulate two different utility based decide modules for the PNs and SNs according to their separate situations and individual purposes, including the traffic expection and adaptive learning parameters.

  3. 3.

    We propose a double auction based spectrum trading (DAST) model for spectrum management in the cognitive network under two different scenarios assorted from the supply-and-demand relationship between PNs and SNs which expands the practicality of the proposed scheme. Also, we discuss and test the Bayesian Nash equilibrium of the proposed double auction game.

  4. 4.

    In the secondary networks, we discuss four different bidding strategies through community negotiation to further the heterogenous network application.

  5. 5.

    We present the reciprocal model practice from the system level and prove that the proposed schemes serve effectively for the spectrum competition in the cognitive network.

1.3 Organization of this paper

The remaining sections of the paper are organized as follows. Section 2 presents a cognitive architecture. Given a particular OODA based behavior model related with the spectrum competition, Sect. 3 is devoted to formulate the utility functions for the primary and secondary networks respectively, exploring the specifications and behavior motivations of them. Section 4 discusses the interaction between the two utility functions in the LPMS and MPLS scenarios. In Sect. 5, we discuss the existance and location of equilibrium of the proposed double auction games. Then, Sect. 6 examines the reciprocal negotiation within one secondary network under four cases, and Sect. 7 presents the simulation results. Finally, Sect. 8 concludes the paper and highlights the research issues that the proposed DAST functions in the dynamic spectrum allocation of heterogeneous cognitive network.

2 Behavior modeling in the cognitive networks

2.1 Primary and secondary networks

Consider a cognitive network with I primary networks (PN) and several heterogeneous secondary networks (SN) around in Fig. 1, where each PN is assigned with licensed bands uniformly. Assume a PN-controller takes charge of spectrum management in the cognitive network, which not only allocates resource for PUs, but also maintains dealings with SNs on the current vacant bands. SNs compete for the spectrum access opportunities as a union in the open, unlicensed band, and then allocate for ruled SUs in certain strategies. We assume that users within the SNs are homo parochius [29], i.e. they value insiders’ (SUs in the same SN) welfare more than that of users outside, evaluate insiders personal qualities higher than those of outsiders (SUs in the other SNs), and partially suppress personal goals in favor of the goals of the group of insiders. However, the relationship of each SN with PN is more similar to the homo reciprocal society [29], where they interact strategically with a propensity to cooperate with each other. Therefore, SNs pursue the access opportunities at a cost while PN can snatch the revenue by leasing the vacant spectrum to SNs, such that both of them will benefit from the win-win cooperation.

Fig. 1
figure 1

Behavior model prototype in the cognitive network

Specifically, we indicate I PNs and K SNs with the sets \(S_{PN}=\{p_1,p_2,\ldots,p_I\}\) and \(S_{SN}=\{SN_1,SN_2,\ldots ,SN_K\}\) respectively. Moreover, there are \(N_k,k\in\{1,2,\ldots ,K\}\) users in the kth SN, indicated by the set \(SN_{k}=\{s_{j}^{k}\},j\in\{1,2,\ldots, N_k\}\), i.e. the jth user in the kth SN. In the cognitive network, the total system bandwidth B is divided into Tc OFDM subchannels. We authorize PN \(p_i,i\in\{1,2,\ldots, I\}\) with u i OFDM subchannels indicated by a channel index vector \(A_{p_i}=\{\alpha_{p_i}^n\},\alpha_{p_i}^n \in\{0,1\},n\in\{1,2,\ldots u^i\}\), where α n p i  = 1 indicates that the nth subchannel is allocated to p i and vice versa. Denote \(A_{s_{j}^{k}}=\{\beta_{s_{j}^{k}}^n\}, \beta_{s_{j}^{k}}^n\in\{0,1\},n\in\{1,2,\ldots v^j\}\) as the subchannels access for SU s k j , where \(\beta_{s_{j}^{k}}^n=1\) indicates that the nth subchannel is released to p i and vice versa. Define \(g_{p_i}^n, g_{s_{j}^{k}}^n\) as the channel gain for p i and s k j on the nth subchannel. In this model, users can obtain \(g_{p_i}^n\) and \(g_{s_{j}^{k}}^n\) by the channel estimation or cluster-based sensing [30, 31]. \(p_{p_i}^n\) is the power of p i on the nth subchannel and the channel noise is assumed to be independently and identically distributed (i.i.d) zero-mean complex additive white Gaussian random variables with variance σ2 on all the links.

Note that each entity in the cognitive network has a corresponding cognitive function by executing a cognitive process respectively. There are five essential modules in the OODA (Orient-Observe-Decide-Act) [32] circle as illustrated in the top right corner of Fig. 1, also called as a cognitive circle, of which the decide module is the principal of them; Meanwhile, the learn module is the delicate part that presents an open, updating, and adaptive orientation within the circle. The explicit introduction of the related OODA circle is defined as below.

2.2 Behavior modeling analysis for spectrum competition

Exploiting the “cognition” [13], an OODA cycle for spectrum competition of cognitive network as depicted in Fig. 2 can be interpreted as follows.

Fig. 2
figure 2

OODA Cycle Interpretation for Spectrum Competition in Cognitive networks

  • Observe

The observe function is initialized to sense the radio environment, such as channel condition and context measurements. To be specific, for the PN, it scans the current spectrum utilization, requiring bidding signals from SNs around, hostile selling signal from PN around, and so on; for the SN, it detects the vacancy or occupation of the spectrum around, and their corresponding strategies for further potential trade.

  • Orient

The orient function is to analyze and parse the observed messages, receive and interpret policy language, and evaluate this information to determine its importance for the decision.

  • Decide

The decide function is to present and compare the potential choices within the decide region. It comes up with a wise decision according to a knowledge-based decision strategies, and makes out the parameters for delivering. In this work, we set up two distinguishing utility based decision making models for PNs and SNs to cope with their private context situation.

  • Act

The act function is to execute the tangible action complying with the decided parameters. For the PNs, it is to release the spectrum and sell a partial spectrum to the SNs at a certain price; for the SNs, it would be to occupy this spectrum and buy a partial spectrum from the PNs at a certain price. Note that it is the only module that causes effect to the outsider environment.

  • Learn

Of the five modules, the learn function is a delicate function that acts warily but works overwhelmingly. It obtains new skills and evolves intelligently by discovering spectrum usage norms and exception, remembering the signatures of the variable radio context, and extracting relevant aspects such as new features.

In this paper, we adopt learning into the decide module for simplicity, and propose utility evaluations based decide module for spectrum competition. Accordingly, the joint consideration of these two modules requires that there are dynamic parameters involved in the decide module, which are adjustable based on some historical and expectation factors. This will be discussed in the Sect. 3.6.

3 Formulating the utility for evaluations

In the cognitive network mentioned above, there are two kinds of interests that should be paid attention to. First, the service in PNs is has a priority to be guaranteed, each of which will be assigned with licensed bands in case for any possible transmission. Second, the interests of the SNs are to be satisfied as best-effort as possible. In the manuscript, we consult the spectrum trading model to settle the dynamic spectrum management where PNs sell the free bands but SNs buy. The objective of spectrum trading is to maximize the revenue of PNs while maximizing the satisfaction of SNs. As we all know, some evaluation mechanism have to be erected to ensure the exchange process in the spectrum market. Hereby, we formulate two different utilities for PNs and SNs according to their practical situations and community purposes in cognitive network. In a distributed architecture the cognitive radio entities will observe and make decisions independently [33]. Accordingly, we assume that entities in the cognitive networks are selfish but rational; each of them will establish a trustable reputation by reporting the true payoff or cost. The two utilities designed for decision of PNs and SNs will be depicted after addressing the network challenges.

3.1 Utility function

Generally, the concept of utility function is used to quantify satisfaction of entities, e.g., networks or participators. Depending on the speciality of spectrum competition in the cognitive networks, a spectrum supply and demand model subsists in terms of market mechanism. In the game theoretical methodology, a game is formulated to capture the selfish and cooperative behavior of the players. Considering that users in these networks experience different conditions, different spectrum availability as well as different transmission expectation, we establish two separate utility functions with respect to their deviating behavior. Different types of utility functions (e.g., a logarithmic or sigmoid function of transmission rate) can be used to evaluate performance and how nodes choose their utility function can significantly impact network behavior. To further complicate matters, how utility functions impact network behavior varies from situation to situation, and changes over time [15].

When designing the utility functions for the cognitive spectrum competition, there are three major motivations for attention. First, the function used to be designed concave so that it is able to represent the saturation of user satisfaction as the transmission rate increases. Second, from the economical point of view, there are two kinds of momentum in the spectrum trading, that is the revenue or benefit part (i.e. the PNs get revenue by releasing the spectrum while the SNs benefiting from the transmission evaluation), and cost or payoff part (i.e. the cost for the PNs to release the spectrum while the SNs pay for the leasing), besides, the payoff is attorned from the SNs to the PNs fully or partially according to the business mechanism. If we assume that there is no discount between them, the payoff of SNs is equal to the revenue of PNs. Third, the utility function should be dynamic and adjustable; therefore we can update parameters flexibly with respect to the variability caused by node mobility, various channels, elastic traffic and etc.

In what follows, we will focus on coping with these issues for PNs and SNs discriminatively. We first probe challenges of different entities in the spectrum trading process, and formulate utility functions to capture them. Assume that PNs have access to perfect, completely up-to-data information about each SNs’ bidding price, but have no ear for other PNs’ payoffs. Meanwhile, users within the same SN share transmission information with each other disinterestedly, but knowing nothing of other SNs’ due to the spectrum competition.

3.2 Challenges for PNs

For the licensed PNs, there are challenges to consider.

  • Under the macroeconomic control of i-NodeB. A PN competes with other PNs for the spectrum allocation under the dominion of a cell i-NodeB according to his own usage of the resource and leverage the income.

  • Market competition from other PNs. PNs scramble for the revenue by competing for business opportunity with potential clients. It means that the PN has to set up a proper selling strategy (i.e. the quantity, selling price and etc.) to get the profit maximization. For example, if he set the higher price than other competitors, SNs might deviate from buying and result in a customer disturbance. Otherwise, if he sold at a price lower than the intrinsic cost, he might be at a disadvantage in the exchange.

  • Delicate supply and demand relationship. Moreover, there exists a delicate supply and demand relationship between the sellers (i.e. PNs) and buyers (i.e. SNs). The sellers won’t trade in the business unless he gets the transmission appeal. Once the trading-off collides with his own usage, he intends to break the cooperation immediately. However, SNs won’t have any intention to deal with such trading fraud. Therefore, the PNs might establish the individual reputation by a second thought when drawing back his release, when personal transmission requirement occurs.

  • Countermeasures to the collusion among the SNs. However, PNs might risk cheating from the SN, who intends to lease channels at a lower payoff. If unlicensed within the SNs collude with each other, the PN might accept arbitrary low bidding price from the collusion rings [19]. Therefore, how to design an efficient collusion-resistant utility to preserve the public order as well as to share the spectrum dynamically becomes an imminent and crucial task.

  • Instant regulation of marketing strategy. Considering the spectrum dynamics caused by wireless channel variations, user mobility or varying wireless traffic, as well as competitive PNs, heterogenous SNs, the PN faces a susceptible and unstable situation in the marketing environment. Hence, he has to adjust the marketing strategy intelligently to ensure the success in the market competition.

3.3 Utility design of decision making model for the PNs

Based on the previous considerations, we are motivated to define the utility function for the PNs. Normally, market sellers used to calculate their net income by knocking off cost from their appearing revenue. Specifically, we define \(R_{p_i}^n\) as the revenue for p i on the nth subchannel by leasing the licensed spectrum, and \(C_{p_i}^n\) is the intending cost by leasing the spectrum correspondingly. If PN p i intends to lease his channels to a SN SN k , the utility function of p i on all the holding subbands can be written as follows in (1).

$$ U_{p_i} = \sum_{n=1}^{u^i}u_{p_i}^n=\sum_{n=1}^{u^i}{\alpha}_{p_i}^n \pi_{p_i}^n\left(a_{p_i,n}^{SN_{k}}R_{p_i}^n-C_{p_i}^n\right) $$
(1)

In this function, \(R_{p_i}^n\) comes from the payoff of SN k for the spectrum exchange. \(C_{p_i}^n\) is the self-evaluation of the spectrum, herein, we mark it as an equivalent rate on the nth channel in case for unexpected usage loss for not vacating band immediately as in (2). Herein, \(\pi_{p_i}^n\in \{0,1\}\) represents the Opportunity of Transmission (OoT) for p i , i.e. \(\pi_{p_i}^n=0\), indicates that p i takes up the current OoT on the nth subchannel and \(\pi_{p_i}^n=1\) indicates that the nth subchannel is a vacant spectrum. Besides, \(a_{p_i,n}^{SN_{k}}\) is an indicator denoting that a successful release of subchannel n from p i to s k j when \(a_{p_i,n}^{SN_{k}}=1\).

$$ C_{p_i}^n=\phi^n_{p_i}r_{p_i}^n =\phi^n_{p_i}\frac{B}{Tc}log_2\left(1+ \frac{\left|g_{p_i}^n\right|^2{SNR}_{p_i}^n}{\Upgamma}\right) $$
(2)

For reification, let \(\phi^n_{p_i}\) mark the probability of next transmission take-up, which is used to forecast the future usage and potential services. \(r_{p_i}^n\) represents the p i ’s average rate on the nth subchannel. Therefore, \(\phi^n_{p_i}r_{p_i}^n\) is used to evaluate the cost for releasing the channel. Denote \({SNR}_{p_i}^n=p_{p_i}^n/\sigma^2\) as the transmission signal-to-noise ratio (SNR), and \(\Upgamma=-ln(5B_i^{min})/1.5\) is the SNR gap related to a minimal targeted bit-error-rate B min i .

In summary, the utility function of p i on all the holding subbands can be written as follows in (3).

$$ U_{p_i} = \sum_{n=1}^{u^i}{\alpha}_{p_i}^n\pi_{p_i}^n \left(a_{p_i,n}^{SN_{k}}R_{p_i}^n-\phi^n_{p_i} \frac{B}{Tc}log_2\left(1+\frac{\left|g_{p_i}^n \right|^2{SNR}_{p_i}^n}{\Upgamma}\right.\right) $$
(3)
$$ \begin{aligned} &\hbox{s.t.} \quad\sum\limits_{i=1}^I{u}_{i}\leq Ts \quad(\hbox{C}1)\\ &\sum\limits_{i=1}^I\sum\limits_{n=1}^{u_i}\alpha_{p_i}^n\leq Ts \quad(\hbox{C}2) \end{aligned} $$

From the above definition, we can see that there are two parts included in the functions, the revenue from the SN and potential loss written as the cost for spectrum trading. Specifically, each PN wants to earn as much as possible by leasing the unused channels, while minimizing the cost. Therefore, he would like to obtain (3) maximization by selecting a preferred buyer. Moreover, all the revenue and cost information is private and the PNs will not reveal to others due to the selfish natures. However, if the revenue and cost mechanism is designed standardized, the utility will be justified among the PNs. Therefore, we continue to design a generalized forecasting factor and rational bargaining mechanism in the following sections.

3.4 Challenges for SNs

For the unlicensed SN, the following challenges are also need to be considered.

  • Observe and detect the surrounding frequency tone and channel status. As users in the SNs, they are unlicensed for any fixed frequency bandwidth; therefore, they have to make use of every bit of frequency or space to get the transmission opportunities.

  • Compete with SUs in other SNs as a coalition. SUs in the same SN are likely to work as a union to scramble for the local community welfare maximization.

  • Choose from PNs to buy spectrum opportunities. Each SN selects the vender according to an average preference for the available channels (i.e., the size and status of spectrum offered by PNs, the spectrum prices and etc.)

  • Discuss and decide which user is due to the strived transmission opportunity within one SN. Once they selected the eligible PNs, SUs within the SN would discuss with each other to decide who will take use of this channel. Assume that SUs within the same SN are harmonious, and they can negotiate on a fair and rational basis. Therefore, they can arrange the opportunity to users according to the negotiated orders.

  • Learn and gradually change the decision on spectrum selection. Due to the secondary position in the spectrum trading market, SNs have to adjust the payoff gradually to obtain the access opportunity. Considering the varying radio environment and trading history, they are motivated to change decision by an observed knowledge based learning method.

3.5 Utility design of decision making model for the secondary networks

Similarly, we are motivated to define the utility function for SNs after analyzing the above challenges. As each user in the SNs has different preference for channels, it would be better to define the utility for SNs from the secondary user’s aspect, and then come up with a network preference by interior negoriation. As the market buyers, they are to obtain usage of goods as the benefit while paying for the transaction. Specifically, let \(B_{s_{j}^{k}}^n\) be the benefit obtained if SU s k j successfully rents the nth subchannel from the PN p i , and \(P_{s_{j}^{k}}^n\) denote the corresponding payoff for such renting. The utility function of SU s k j can be modelled as in (4).

$$ U_{s_{j}^{k}} = \sum_{n=1}^{v^j}u_{s_{j}^{k}}^n=\sum_{n=1}^{v^j} {\beta}_{s_{j}^{k}}^n\left(B_{s_{j}^{k}}^n-P_{s_{j}^{k}}^n\right) $$
(4)

For the SU, the most direct benefit is the potential transmission ability, and thereby we represent the transmission rate for it. Therefore, the benefit \(B_{s_{j}^{k}}^n\) can be derived as (5),

$$ B_{s_{j}^{k}}^n = a_{{p_i},n}^{SN_{k}}\frac{B}{Tc}log_2\left(1+\frac{ \left|g_{s_{j}^{k}}^n\right|^2{SNR}_{s_j^k}^n}{\Upgamma}\right) $$
(5)

When designing the payoff, there are several things to consider. First, the SUs would not like to pay much for the spectrum, unless they desires seriously. Therefore, the payoff \(P_{s_{j}^{k}}^n\) is directly proportional to the degree of transmission urgency. We represent \(\varphi_{s_{j}^{k}}\) as the number of accumulative bits in all the waiting packets as in [34]. Second, the SUs prefer to choose channels in good conditions; hence, the payoff is in direct proportion to the propagation gain. Third, the payoff is inversely proportional to a SNR logarithm which reflects some intrinsic transmitting ability. For instance, a SU will cut down his payment if exhibiting a higher SNR, since a higher SNR user has a less stringent requirement for the transmission environment. The payoff \(P_{s_{j}^{k}}^n\) can be expressed as (6),

$$ P_{s_{j}^{k}}^n =\lambda_{s_{j}^{k}}{\frac{\varphi_{s_{j}^{k}} \left|g_{s_{j}^{k}}^n\right|^2}{log_2{\left(1 +SNR_{s_{j}^{k}}^n\right)}}} $$
(6)

Herein, \(\lambda_{s_{j}^{k}}\) is an adaptive factor which enables the SUs to learn flexibly. It will be explicated in the next subsection.

Summarily, the utility function of s k j on all the potential subbands can be written as follows.

$$ U_{s_{j}^{k}} =\sum_{n=1}^{v^j}{\beta}_{s_{j}^{k}}^na_{{p_i},n}^{SN_{k}} \left(\frac{B}{Tc}log_2\left(1+\frac{\left|g_{s_{j}^{k}}^n\right|^2 {SNR}_{s_j}^n}{\Upgamma}\right)-\lambda_{s_{j}^{k}}{ \frac{\varphi_{s_{j}^{k}}\left|g_{s_{j}^{k}}^n\right|^2} {log_2{\left(1+SNR_{s_{j}^{k}}^n\right)}}}\right) $$
(7)
$$ \begin{aligned} &\hbox{s.t.}\quad\sum\limits_{i=1}^I{v}_{j}\leq Ts, \quad(\hbox{C}1)\\ &\sum\limits_{i=1}^I\sum\limits_{n=1}^{v_j}{\beta}_{s_{j}^{k}}^n\leq Ts\quad(\hbox{C}2) \end{aligned} $$

Usually, users in the same SN works as a union to compete with other candidates. Therefore, there is a network bidding price i.e.,

$$ P_{SN_{k}}^n=f\left(P_{s_{j}^{k}}^n\right) $$
(8)

Meanwhile, network utility equals the eligible user’s utility, i.e. (9), within which the f() function is bred according to different strategies in Sect. 5.

$$ U_{SN_{k}}^n=f\left(U_{s_{j}^{k}}^n\right) $$
(9)

3.6 Expectation and learning

To achieve the cognitive learning process, there are build-in expectation and learning parameters in the design of utility. Since the PNs’ services vary in a stochastic manner (i.e. due to flexible service type, discretionary packet arrival and arbitrary departure), the spectrum supply and price charged to the secondary user become random. Herein, we use a parameter ϕ n p i to forecast transmission of users in PN p i in the next slot; therefore, the PNs can vary the size of spectrum opportunities to be sold to the SNs. The estimated forecasting parameter can be obtained as follows.

$$ \phi^n_{p_i} =\rho^{'}_{p_i}\frac{\varsigma^{'}(t+1)}{B_{p_i}^{max}} $$
(10)

Herein, ς'(t + 1) is the estimated packets arrival calculation for the next time slot. It can be obtained by reinforcement learning theories, but in this work we model the packet arrival following the Poisson distribution as in [35] and [36]. B max p i is the maximal buffer size for user p i . Besides, ρ n p i is a parameter mapping the forecasting service of p i , which can be furthered by service modelling. Therefore, ϕ n p i reflects the transmission expectation estimated in terms of the service types and arriving packets, which conciliates the competition from other PNs and tackles the supply and demand relationship in the spectrum trading market adaptively.

As to the SUs, we present an adaptive learning parameter \(\lambda_{s_{j}^{k}}\) to enable the SUs to adjust the payoff according to not only the self-requirement but also dynamic exchange in the market. A higher value of \(\lambda_{s_{j}^{k}}\) means that the SU is urgent to this OoT, and vice versa. Note that \(\lambda_{s_{j}^{k}}\) is used as an intelligent parameter which is perceived and predicted by the historical observed information and saved interpreting algorithm. This parameter can be obtained using learning algorithm in artificial intelligence [37]; however, it is not the focus of this paper. Note that, in this paper we only adopt learning parameters in the utility design, and discuss the effect of learing parameters in the simulation.

3.7 Interaction of the two utilities

Social decisions can be imposed by the central spectrum management or negotiated in a decentralized manner of the wireless users, e.g., using bargaining solutions to maintain certain fairness rules. As a common case in the market, the payoff will not be entirely transferred from SNs to PNs due to the market mechanism. Thereby, we design a tax factor between the P n p i and P n SN k as follows.

$$ R_{p_i}^n =\zeta P_{SN_{k}}^n $$
(11)

Herein, \(0<\zeta<1\) is the bargain coefficient between PNs and SNs. When \(\zeta=1\), it denotes that the payoff is totally transferred from SUs to the renter, Otherwise, it means that there is \((1-\zeta)\) payoff taxed by the system or grabbed by spectrum agency.

4 Double auction based decision making for spectrum competition

Because the interaction of primary and secondary networks is more like the business trading in economics, we are motivated to design a double auction [38] based spectrum trading for the spectrum management in cognitive network. Based on the previously defined utility function, it is feasible to model the strategies of biddings for buyers and asking prices for sellers as the valuations of buyers and sellers.

In a double auction game, players are buyers and sellers (i.e. primary network and secondary network respectively). They have some valuations of the goods (spectrum & transmission slot etc.) that are traded in an auction. Their strategies are bids for buyers and asks for sellers which depend on the valuations of players. Moreover, the payoffs depend on the price of the transaction and the valuation of a player.

In the traditional auction mechanism, the PN with the lowest acquisition cost and the SN with the highest reward payoff will be eligible to trade. However, in most cases, one PN’s favorite purchaser will be reluctant to be in for this game, thus resulting in a supply and demand disagreement. Such dilemma lies in who is primary to take initiative in the purchase. In order to tackle this contradiction, we discuss the spectrum management between the PNs and SNs according to two different scenarios assorted by the relationship of supply and demand: (1) when the supply falls short of demand (i.e. less PNs while more SNs); (2) when supply exceeds demand (i.e. more PNs while less SNs). Note that in the cognitive networks, the i-NodeB charges the network topology information and balances the supply and demand dominant scenarios. In what follows, we assume that the available channels from the PNs are leased for usage of certain time period T. Also, we assume that the cost of the PNs and reward payoffs of the SNs remain unchanged over this period.

4.1 Less primary networks more secondary networks (LPMS) scenarios

If there are more SNs while the PNs are relatively less, it means a PN-dominating case, where the PN works as an auctioneer. As depicted in Fig. 3, the spectrum access period begins with broadcast of the OoT from PN. Then, each SU calculate the payoff \(P_{s_{j}^{k}}^n\) and utility \(u_{s_{j}^{k}}^n\) according to (6, 7) by sensing the conditions of the available channel. After network negotiation within the same SN, each SN comes up with a bidding strategy P n SN k , and then SNs bid for the available channels at the price vector \(\vec{P}_{SN_{k}},\vec{P}_{SN_{k}}^n=\{P_{SN_{k}}^n\}. \) The payoff is transformed to the PNs taxed by (11). Herein, PNs are to calculate their utility according to (3) and feedback a channel assignment message to whom to lease the current channel by the optimization of (12).

$$ \begin{gathered} o(\beta _{{s_{j}^{k} }}^{n} ) = argmax_{{s_{j}^{k} }} u_{{p_{i} }}^{n} (g_{{p_{i} }}^{n} ,\alpha _{{p_{i} }}^{n} ,\pi _{{p_{i} }}^{n} ,a_{{p_{i} ,n}}^{{SN_{k} }} ) \hfill \\ {\text{s}}{\text{.t}}{\text{.}}\quad u_{{s_{j}^{k} }}^{n} \ge u_{{s_{j}^{k} }}^{r} ,\quad ({\text{C}}1) \hfill \\ \sum\limits_{{n = 1}}^{{u^{i} }} {\alpha _{{p_{i} }}^{n} } \phi _{{p_{i} }}^{n} \le T_{c} \quad ({\text{C}}2) \hfill \\ \sum\limits_{{j = 1}}^{J} {a_{{p_{i} ,n}}^{{SN_{k} }} } \le 1\quad ({\text{C}}3) \hfill \\ \end{gathered} $$
(12)
Fig. 3
figure 3

Procedure flow for LPMS between PNs and SNs

Note that although the PNs are dominant in the network, the SNs have right to accept or refuse the channel assignment by (12) according to their separate intelligent and rational features. Assume that each SU has a reserved utility \(u_{s_{j}^{k}}^r\) as a bottom-line. Therefore, SN accepts the channel assignment by PN only when the (12-C1) holds, otherwise, he refuses such assignment. (12-C2) means that the OoT is restricted by licensed subcarrier and (12-C3) restricts that each subcarrier is delivered to one SN exclusively. Thereafter, the PN has to choose the favorite buyer from the rest candidates’ set until he receives an accept message from the SNs untill the channel release. In the rest of T period, the SN rents the channel for communication. A new period repeats when the SU sends back the release signal, and suchlike. The explicit traffic flow of this procedure is illustrated as in Fig. 3.

4.2 More primary networks less secondary networks (MPLS) scenarios

If there are more PNs and less SNs coexisting, it means a SN-dominating case, where SN works as an auctioneer. The initialization of the auction process is similar to the LPMS, but for the decision is originated from the SN. Each SN calculates the utility value according to (7) and decides to whom he wants to send bidding message by the optimization of (13).

$$ \begin{gathered} o(a_{{p_{i} ,n}}^{{SN_{k} }} ) = argmax_{{s_{j}^{k} }} u_{{s_{j}^{k} }}^{n} (g_{{s_{j}^{k} }}^{n} ,\beta _{{s_{j}^{k} }}^{n} ) \hfill \\ {\text{s}}{\text{.t}}{\text{.}}\quad u_{{p_{i} }}^{n} \ge u_{{p_{i} }}^{r} ,\quad ({\text{C}}1) \hfill \\ \sum\limits_{{j = 1}}^{J} {\beta _{{s_{j}^{k} }}^{n} } \le 1\quad ({\text{C}}2) \hfill \\ \end{gathered} $$
(13)

Correspondingly, each PN has a reserved utility u r p i , according to which they work out the accept or refuse message. If the private utility is higher than u r p i , i.e. (13-C1) holds, he would like not to refuse the bidding until he chooses the favorite from the candidate bidders by (13). Then, he returns the accept message to s k j , but refuses messages for others, i.e. (13-C2) holds. In this way, it ensures that each orthogonal subchannel is assigned to one SN exclusively. The process from the accept message to channel release is the same with LPMS. The explicit traffic flow of this procedure is illustrated as in Fig. 4.

Fig. 4
figure 4

Procedure flow for MPLS between PNs and SNs

5 Bayesian nash equilibrium for the two double auction games

Suppose that, in our spectrum trade model, the buyers and sellers only have private information about their valuations, thus forms a Bayesian Nash game with incomplete information (asymmetric information) [39]. In what follows, we are going to discuss the existence of the Bayesian Nash Equilibrium and where they are.

5.1 Existence of the equilibrium

To guarantee that the equilibrium exists, it suffices the plays’ preferences to be convex (although with enough plays this assumption can be relaxed). In this paper, we define the preference of p i and s k j on different spectrum channel as the utility evaluation function \(u_{p_i}^n\) and \(u_{s_{j}^{k}}^n\). Therefore, in the follows, we first discuss the convex preference of \(u_{p_i}^n\) and \(u_{s_{j}^{k}}^n\).

Lemma

A preference relation u n p i and \(u_{s_{j}^{k}}^n\) on the trade set \(\varpi\) is convex if for any payoff preference xyzIf \(u_{p_i}^n(y)\geq u_{p_i}^n(x), u_{p_i}^n(z)\geq u_{s_{j}^{k}}^n\), for any \(\theta\in [0, 1]\), there are \(\theta\cdot u_{p_i}^n(y)+(1-\theta)\cdot u_{p_i}^n(z)\geq u_{p_i}^n(x)\) holds.

Proof

When \(u_{p_i}^n(y)\geq u_{p_i}^n(z), \theta\cdot u_{p_i}^n(y)+(1-\theta)\cdot u_{p_i}^n(z)\geq \theta\cdot u_{p_i}^n(z)+(1-\theta)\cdot u_{p_i}^n(z) =u_{p_i}^n(z)\geq u_{p_i}^n(x)\)

When \(u_{p_i}^n(y)\leq u_{p_i}^n(z), \theta\cdot u_{p_i}^n(y)+(1-\theta)\cdot u_{p_i}^n(z)\geq \theta\cdot u_{p_i}^n(y)+(1-\theta)\cdot u_{p_i}^n(y) =u_{p_i}^n(y)\geq u_{p_i}^n(x).\)

Therefore, u n p i is a convex preference definition that guarantees a general equilibrium exists since that players’ strategies are monotonically increasing with payoff.

Similarly with the proof of \(u_{s_{j}^{k}}^n\).

5.2 Where is the equilibrium?

In the static Bayesian game, a strategy for the sellers is a function u n p i that specifying the benefit value that the seller will demand for each of the buyer’s valuations, likewise, a strategy for the buyer is a function \(u_{s_{j}^{k}}^n\) specifying the cost price that the buyer will offer for each of the seller’s possible valuation.

In the LPMS scenario, the seller will expect a maximization of u n p i , conditional on the demand that payoff from target buyer being higher than the cost, and the best payoff from other competitive buyers. Thereafter, for each payoff, the Bayesian Nash equilibrium occurs where (14) holds.

$$ max u_{p_{i}}^n =max\left\{\left(R_{p_i}^n-C_{p_i}^n\right) pr\{R_{p_i}^n>C_{p_i}^n\}pr\{\varsigma R_{p_i}^n>P_{s_{j}^{k}}^n\left(u_{s_{j}^{k}}^n\right)\}\right\} $$
(14)

where P n p i (u n p i ) is the inverse function of u n p i for the bidding payoff.

In the MPLS scenario, the buyers’ strategies are the best response to the auction game. They will prefer to trade with the counterpart on conditional that the payoff being lower than the expectation valuation and the best within all the spectrum suppliers. Therefore, the strategy specifies the Bayesian Nash equilibrium, if the payoff solves (15).

$$ maxu_{s_{j}^{k}}^n = max\left\{\left(B_{s_{j}^{k}}^n-P_{s_{j}^{k}}^n\right) pr\left\{B_{s_{j}^{k}}^n>P_{s_{j}^{k}}^n\right\}pr \left\{\left(B_{s_{j}^{k}}^n-P_{s_{j}^{k}}^n\right)> \left(B_{s_{j}^{k}}\left(u_{s_j^k}^n\right)-P_{s_{j}^{k}} \left(u_{s_j^k}^n\right)\right)\right\}\right\} $$
(15)

where \(B_{s_{j}^{k}}(u_{s_j^k}^n)\) is the inverse function of \(u_{s_j^k}^n\), and so does \(P_{s_{j}^{k}}(u_{s_j^k}^n)\).

In fact, there are many Bayesian Nash equilibria no matter for LPMS and MPLS. Consider one equilibrium price x n, for example, accordingly, the buyers’ baseline strategies is no lower than the potential benefit, i.e., \(x^n<B_{{s_j}^k}^n\), while for the sellers, the baseline is at least higher than reserved cost price, i.e., \(\varsigma x^n<C_{p_i}^n\). Therefore, we can see trade occurs for the space with grids in Fig. 6, where all the \((C_{p_i}^n,B_{{s_j}^k}^n)\) pairs suffice the double auction game equilibrium.

Fig. 5
figure 5

Reciprocal negotiation mechanism within the SN

Fig. 6
figure 6

Trade space of the proposed double auction game

6 Reciprocal negotiation in the secondary network

In the cognitive network, due to the self-interest character, there is broad competition among PNs, SNs and between them. However, we assume a reciprocal negotiation mechanism within the SN. Secondary usrs in the same network make a concerted effort to compete for spectrum access. Hence, each SN generates a uniform bidding policy \(\Uptheta\{s_{j}^{k},P_{p_i}^n\}\) according to different community strategy. In this section, we propose four bidding strategies depending on users’ bidding policy. Each bidding strategy could be executed by an actual head node in the SN, or a virtual head node played by either of SUs. Fig. 5 depicts the reciprocal negotiation architecture within the secondary network.

6.1 Maximal bidding case

In this case, the SN generates a bidding policy according to the maximal function (14) during the stage of the auction game. We expect that each SN announces a higher price for bidding, thus it is fit for the bad channel condition or when the primary networks are rigorous, under which condition each SN is apt to bid at a higher price within his payment ability. Hence, under this policy SNs present a network bidding price P n p i from secondary user s k j . Correspondingly, the network utility equals the utility of users s k j .

$$ \Uptheta\left(s_{j}^{k},P_{p_i}^n\right) =arg max_{SN_i}P_{s_{j}^{k}}^n\left(g_{s_{j}^{k}}^n, \beta_{s_{j}^{k}}^n\right) $$
(16)

6.2 Minimal bidding case

In this case, the SN generates a bidding policy according to the minimal function (15) during the stage of the auction game. Each SN announces a minimal price for bidding, thus it is fit for the good channel condition or the primary networks are beneficent, under which condition each SN is apt to bid at a lower price for renting. Hence

$$ \Uptheta\left(s_{j}^{k},P_{p_i}^n\right) =arg min_{SN_i}P_{s_{j}^{k}}^n\left(g_{s_{j}^{k}}^n,\beta_{s_{j}^{k}}^n\right) $$
(17)

6.3 Average bidding case

In this case, the SN generates a bidding policy according to the average function (16) during the stage of the auction game. Each SN announces an average price, which is a golden mean policy for bidding.

$$ \Uptheta\left(s_{j}^{k},P_{p_i}^n\right) =ave_{SN_i}P_{s_{j}^{k}}^n\left(g_{s_{j}^{k}}^n, \beta_{s_{j}^{k}}^n\right) $$
(18)

6.4 Urgency dependent bidding case

In this case, the SN generates a bidding policy according to the packet urgency function (17) during the stage of the auction game. Each SN announces a fixed price for bidding on each channel without consideration of the channel variance or signal strength, thus it needs less computation complexity and simple sensing process.

$$ \Uptheta\left(s_{j}^{k},P_{p_i}^n\right) =arg max_{SN_i}\varphi_{s_{j}^{k}}\left(g_{s_{j}^{k}}^n, \beta_{s_{j}^{k}}^n\right) $$
(19)

7 Simulation and performance evaluation

In what follows, we first initialize the network topology and working mode. After analyzing the equilibrium space of double auction game from the supply-and-demand relationship, we present the numerical results for the LPMS and MPLS scenarios specially for four bidding cases. Both network scenarios are examined from the three aspects: (1) utility evaluation for PNs and SNs; (2) effect of the expectation parameter \(\phi^n_{p_i}\); (3) effect of the learning parameter \(\lambda_{s_{j}^{k}}\). Finally, we provide a system performance comparison of our proposed approach with the centralized schemes.

7.1 Initialization and parameter setting of the cognitive network

To evaluate the performance of the proposed scheme, we perform the PN as a multiuser OFDMA cellular network with an i-NodeB located in the center, and each PN is allocated with subcarriers with fixed spectrum band. Users in SNs are assumed to be low mobility and randomly located within the cell and allowed to access multiple channels simultaneously. Assume the specturm occupation stay stationary during one scheduling time T s T s »T. Also assume that PN exchange with the SNs through a dedicated pilot channel; users within the same SN share the bidding data and respective broadcasting massages with some security insurance, but are not allowed to overhear exchange information from other SNs. Other detail values of the simulation parameters are shown in Table 1.

Table 1 System parameters

7.2 Existence of bayesian nash equilibrium

We first test the existence of Bayesian Nash Equilibrium in the two scenarios by analysing the bidding payoff relationship with the cost price of PN and benefit evaluation of SN in Fig. 7. With the accumulation of packets, we can see that the payoff grow linearly according to (6). Especially, when \(\varphi_{s_{j}^{k}}\) is small, the payoff is lower than the channel cost of PN C n p i , therefore, he would be reluctant to release the usage of channel. On the other hand, SN would only accept the deal unless the payoff is lower than the potential benefit \(B_{s_j^k}^n\), therefore, the feasible equilibrium outcomes of proposed DAST scheme drop between \(C_{p_i}^n\) and \(B_{s_j^k}^n\). More precisely, since tax is casted between the dealers as (11), there should be \(B_{s_j^k}^n<x^n<\frac{C_{p_i}^n}{\varsigma}\) guaranteed.

Fig. 7
figure 7

The Bayesian Nash equilibrium of the proposed DAST scheme

Consider for LPMS scenario where PN works as the auctioneer, he would like to choose the maximization of \((x^n-\frac{C_{p_i}^n}{\varsigma})\) within the entire equilibrium region. While in MPLS where SN works as the auctioneer, he would prefer the maximization of \((B_{s_j^k}^n-x^n)\). Therefore, different strategies (as circled out) are eligible for the final results with the variance of network scenario and packet arrival.

7.3 Performance results of LPMS

7.3.1 Utility evaluation

We consider a LPMS scenario under the default parameters, and present the averaged utility value of PN and SN with the variance of SNR in Figs. 8 and 9. We can see that there is a universal descending trend for PNs but an ascending trend for SNs for the four different bidding cases. This is because that the effect of channel condition is the minus part in the design of (3) but the plus part in (7). Specially, when the channel goes into bad, the PN gets a higher utility value, thus he is reluctant to release the spectrum usage; In the SNs, users get higher utility evaluation once sensing preferred channels with better channel condition, which means that they desires for this spectrum access, and vice versa.

Fig. 8
figure 8

The effect of forecasting parameter ϕ n p i of LPMS for PNs and SNs

Fig. 9
figure 9

The effect of the learning adaptive parameter \(\lambda_{s_{j}^{k}}\) of LPMS

As to the four bidding strategies, we can see that the utility of PN with the maximal bidding strategy (Up(Max_bidding)) gets the highest value while the lowest for the minimal case (Up(Min_bidding)). The average (Up(Ave_bidding)) and packet urgent (Up(Urg_bidding)) strategies maintain the in-between value but Up(Urg_bidding) flucturates due to the arbitrary packets arrival. The bidding strategies from the SN is a negotiated result for members within the same network, and different bidding strategies will affect the business in the spectrum auction market. On the contrary, the higher biding price will result in a lower utility for SUs, and thereby the four bidding cases take reverse trend in Fig. 9.

7.3.2 Effect of expectation

Figure 10 shows the average utility of PNs and SNs with the variance of \(\phi^n_{p_i}\). The parameter \(\phi^n_{p_i}\) is to forecast transmission of PN in the next slot with respect to the QoS grade, burst packet arrival and etc. With \(\phi^n_{p_i}\) growing, the utility evaluation for PNs decreases which means that they are unlikely to sell the spectrum due to self-usage probability. Thanks to the adaptive mechanism of SU with \(\lambda_{s_{j}^{k}}\), SU has to add the payoff to get more transmission opportunity which results in the decrease of utility of SU. Thereafter, Us values keep a roughly descending trend but with less slope compared with Up values.

Fig. 10
figure 10

The utility of PNs with the variance of SNR of LPMS

7.3.3 Effect of learning parameters

Also, we examine the learning parameter \(\lambda_{s_{j}^{k}}\) in (6). The adaptive parameter is to enable the users in SN to adjust the bidding price according to not only the self-willingness but also dynamic exchange in the market. Accordingly, it influences the utility evaluation of both PNs and SNs. From Fig. 11, we can figure out that the utility of SN decreases when the \(\lambda_{s_{j}^{k}}\) increases but the utility of PN takes an ascending trend. The explanation of this is that a higher \(\lambda_{s_{j}^{k}}\) means that SN is willing to obtain this spectrum access, thereby he pays more which leads to a reduced self-utility evaluation but gained utility for PNs.

Fig. 11
figure 11

The utility of SNs with the variance of SNR of LPMS

7.4 Performance results of MPLS

Generally, the performance of MPLS scenario is similar to that of LPMS except the different decision-maker. In the LPMS scenario, the PNs are primary to decide to whom to release the spectrum usage, but in the MPLS, the SNs take the decisive occasion to choose from whom to lease the spectrum contrarily. We discuss the performance of MPLS from the same three aspects similarly. Therefore, we do not give the detailed analysis but only the presentation of performance comparison with figures above.

7.4.1 Utility evaluation

Since the utilities of PNs and SNs in MPLS scenario take the same trend under the four different bidding cases, we do not give the performance results as figures but only forms of percentage of variant cases as Tables 2 and 3 compared with that in LPMS scenario, from which we can get a clear comparison of these two scenarios.

Table 2 Percentage of the utilityevaluation for PNS in MPLS scenario compared with LPMS (U LPMS − U MPLS) / U LPMS
Table 3 Percentage of the utilityevaluation for SNS in MPLS scenario compared with LPMS (U MPLS − U LPMS) / U LPMS

Table 2 presents that the percentage of Up in MPLS scenario compared with that in Fig. 8 (LPMS). We can see that the utility of MPLS for PNs is much lower than that of LPMS for each bidding cases except the Up(Urg_bidding) case. This is because that in this scenario, the secondary networks are primary to settle the decision and they desire a lower payoff which cuts off the renevue of PNs. Meanwhile in LPMS, PNs have to satisfy constraint (C1) in (12) at least. Therefore, the utility of PNs of MPLS is lower bounded while higher bounded for that of Fig. 8. As to the Up(Urg_bidding) case, some minus perscentage occors since it depends much on the arbitrary packets arrival.

Table 3 presents that the percentage of Us in MPLS scenario compared with that in Fig. 9 (LPMS). The utility of MPLS for SUs is much higher for each bidding cases than that in the LPMS scenario. According to what have said above, in the MPLS, a lower payoff means SNs income if the usage benefit stays the same. Meanwhile, SNs are required to satisfy constraint (C1) in (13). Therefore, the utility of SNs in MPLS is higher bounded while lower bounded for that of Fig. 9.

7.4.2 Effect of expectation

In order to observe the effect of expectation parameter, we focus on the variance of ϕ n p i in (1) to [0.01, 0.1] in Fig. 12. We do not give much analysis on this since similar variant results can be found in Fig. 10.

Fig. 12
figure 12

The effect of forecasting parameter ϕ n p i of MPLS for PNs and SNs

7.4.3 Effect of learning parameters

Similarly, we observe the influence of learning parameter \(\lambda_{s_{j}^{k}}\) in the (7) restricting the variant space to [0.01, 0.1]. Compared with Fig. 11, we can see that Up and Us keep accordant going trends, however, the utility of SN in Fig. 11 decreases in an up-convex trends but in Fig. 13 in a down-convex trend, and vise versa for the utility of PN. These results are caused by the difference of the decision maker and interaction procedures as depicted in Figs. 3 and 4.

Fig. 13
figure 13

The effect of the learning adaptive parameter \(\lambda_{s_{j}^{k}}\) of MPLS

7.5 Comparison with the traditional approaches

As the existing pricing based spectrum access methods are different in the network scenario, and do not consider the differentiation of physical channel, it becomes difficult to compare our proposed method with them. Nevertheless, we compare the frequency efficiency of the proposed auction based spectrum access approach with the traditional CR access methods, e.g. random access of SUs once sensing spectrum hole (Random Access) [15], maximal channel gain access of SUs by a dedicated centralized secondary base station (SBS) (H-max Access); meanwhile, the normal spectrum usage Without CR is also involved in the simulation as a benchmark.

In Fig. 14 taking the 8dB SNR for illustration, the CR access schemes are observed to produce a frequency efficiency gain at least 65.69 % higher than the Without CR method, among which the H-max Access achieves 219.04 % the highest performance gain. This is because that H-max Access is the optimal throughput maximization method. Besides, the proposed DAST scheme is much higher than the Random Access but 7 % loss to the H-max Access. However, the centralized H-max Access method has many difficulties to implement. For one thing, it need an SBS node to collect information and allocate resource but our proposed scheme reduces the deployment of cognitive network which requires no modification of existing network; For another, the SBS is required to know all SNs’ utility functions or preferences, which is often the private information of networks and is not a common knowledge; Furthermore, in the heterogeneous SNs, it is impossible to tackle the spectrum access centralizedly since no administer node exists. Therefore, H-max Access presents a performance that will hardly be reached in the distributed heterogeneous network while our proposed distributed intelligent scheme not only achieves a comparative spectrum performance with the traditional centralized approaches but are more feasible and rational.

Fig. 14
figure 14

The frequency efficiency comparison

As to the communication and control overhead, we define the times of channel employment \(\epsilon\) to quantify channel cost. In the DAST, there are three times handshake between PNs and SNs according to Figs. 3 and 4, no matter in “accept” or “refuse” situation. In each SN, users propose preference once and are informed with transmission message (Y or N) once. Therefore,

$$ \epsilon_{DAST}=I\times K\times 3+\sum_{k=1}^{K}{N_k}\times 2 $$
(20)

In H-max Access, since it is hardly possible to compose a super-node to collect the vacancy of PNs and reuse them, SNs have to exchange access requirement to PNs one by one. However, collision will happen in such case, since one SN’s favorite channels collide with other SNs, let c times be the average collision times. Therefore,

$$ \epsilon_{\it H-max Access}=I\times K\times c_{times}+\sum_{k=1}^{K}{N_k}\times 2 $$
(21)

We can see that DAST counterbalances with H-max Access when c times is small, but when SNs focus on few PNs, the collision will spread but for avoidance measures. On the other hand, DAST’s computation complexity is not high for only maximal sequence.

8 Conclusion and further discussion

In this paper, we have presented The OODA circle based cognitive behavior model to illustrate the spectrum competition between PNs and SNs in the wireless cognitive network. Considering the disturbance of cognitive condition, market environment of spectrum trading and etc., we formulate two utility evaluation for the PNs and SNs respectively, based on which we proposed a double auction spectrum trading method (DAST) for LPMS and MPLS scenarios. Different from the traditional spectrum sharing approaches, the SNs are allowed to make decision simultaneously and independently and make bid decisions on the resource by the self-interested consideration. The bilateral interaction with introduced market trading provides a fairer spectrum access opportunity building on a cognition equipped network. Simulation results show the effectiveness of the proposed spectrum trading methodology and reinforce performance advantages with different bidding strategies in the cognitive environment.

The future work includes taking learning mechanism in the decision model which forecasts the possibility and employs the observed histories related with spectrum usage and trading. Furthermore, how to make tradeoffs among cognitive network simulation fidelity, reliability, and complexity as well as incorporating the dynamic environmental information is a challenging issue.