Keywords

1 Introduction

The deployment of small cells co-existing with the legacy macrocell network, constitutes a straightforward and effective approach to support the deluge of data traffic induced in the uplink of 5G wireless networks. Though small cells are typically low power, low cost, short range wireless transmission systems (base stations), they have all the basic characteristics of conventional base stations and are capable of handling high data rate for individual users [1, 2]. Accordingly, they can efficiently reuse the spectrum across different geographical areas, thus, improving spectrum efficiency and enhancing network coverage and capacity. Towards the same direction of contributing to improved spectral efficiency and massive connectivity, non-orthogonal multiple access (NOMA) technology has been adopted by 5G networks, allowing multiple users to multiplex in the power domain over the same time/frequency/code resources. The successive interference cancellation (SIC) technique is, then, applied at the receiver to mitigate part of the interference, by sequentially decoding the received signals based on their signal strength [3].

One of the key problems and challenges in this emerging environment, is the inter-user interference caused by different users transmitting over the same resources. The latter heavily depends on the joint user to small cell association and corresponding power allocation problem. Therefore, the development of efficient resource allocation schemes that deal with this problem in NOMA small cell networks (SCNs) is of significant research and practical importance.

1.1 State of the Art and Motivation

Several research works (e.g., [4, 5]) have addressed the emerging resource orchestration problem in NOMA SCNs, and in particular the joint problem of user association and power allocation. For instance, in [4], the problem is formulated as a coalition formation game that associates users with the small cell, for which the total power is minimized in the optimal partition of users. In [5], a similar problem focusing on the joint user clustering and power allocation, is treated in two iterative stages to avoid the exhaustive search of user clusters. Nevertheless, the majority of existing works on the topic of user association and/or interference mitigation, presume perfect knowledge of the global channel and/or network information (e.g., channel state information (CSI)), which is either impossible to have or impractical to obtain. Considering the situation when only statistical CSI is available at the small base stations (SBSs), the private information of the user devices regarding their experienced channel conditions could be utilized to ease the resource allocation procedure. In this context, contract theory, enables the modeling of an incentive mechanism in 5G SCNs under a practical scenario of incomplete information, where the users’ private information (i.e., transmission power, wireless channel characteristics) are not known by the SBSs. Based on contract theory, the negotiations among the SBSs and the users are modeled under a network economics framework aiming to identify the SBSs’ optimal rewards provided to the users in order the latter to determine their optimal transmission power. Indicatively, we note that the work in [6] has attempted to deal with the specific CSI incompleteness problem based on contract theory principles, for heterogeneous long-term evolution advanced (LTE-A) networks. Nonetheless, the proposed approach assumes a central entity determining the optimal user association and contracts, while it is targeting orthogonal frequency division multiple access (OFDMA) environments in order to mitigate or eliminate interference, which in turn reduces the spectral efficiency due to the requirement of channel access orthogonality.

1.2 Contribution and Outline

Targeting both spectral efficiency and application feasibility, it is practically beneficial to assume that the total system available bandwidth is subdivided into orthogonal frequency chunks (i.e., channels). In this regard, adjacent small cells are allocated to different frequency chunks that do not interfere with each other. Within such communication environment, our contribution aims at introducing a resource orchestration framework through a distributed user association and power allocation scheme which reduces computational complexity, while accounting for the CSI incompleteness. Specifically, we propose a contract-based approach, in which the small cell, acting as a monopolist, determines the users’ transmit power level so as to be able of decoding their signals, and rewards them inversely proportionally to their interference in order to incentivize them to accept the proposed contract. The key idea is to offer the right contract item to each user, so that all users have the incentive to truthfully reveal their CSI. Furthermore, the determination of the small cell to which each user is associated with, is performed by the users individually based on the evaluation of the contract items, through a reinforcement learning (RL) algorithm. In a nutshell, our work offers a resource orchestration framework applicable in SCNs that differs from previous existing efforts in the literature, in the sense that it: a) considers an interference-limited wireless environment, adopting the use of NOMA technology that has arisen as a promising access technology in 5G networks, b) formulates the overall resource orchestration problem under an incomplete CSI scenario, and c) promotes the use of distributed approaches in the decision making, eliminating the need for centralized decision making entities.

The rest of the paper is organized as follows. Section 2 contains the considered system model and the introduced SBSs’ and users’ utility functions. In Sect. 3, we present the optimal contract design under incomplete information, while we determine the SBSs’ optimal rewards to incentivize the users to transmit with their optimal uplink transmission powers. The users’ association to the small cells based on reinforcement learning is highlighted in Sect. 4. In Sect. 5, indicative numerical results validating the operation of the contract theoretic framework are presented, while Sect. 6 concludes the paper.

2 System Model

We consider a heterogeneous dense wireless network consisting of a set of users \(U=\{1,\dots , |U|\}\) and a set of SBSs \(C=\{1,\dots , |C|\}\). A set \(\mathbb {U}_c\) of cardinality \(|\mathbb {U}_c|\) users represents the users associated with cell c. The channel gain of user u communicating with the SBS c is denoted by \(G_u^c=k_u/(d_u^c)^a\), where \(k_u\) is a lognormal distributed random variable with mean 0 and variance \(\sigma ^2\), \(d_u^c\) [m] is the distance between user u and SBS c and a is the corresponding path loss exponent. To consider realistic scenarios, the SBS lacks specific information of users’ transmission characteristics, i.e., CSI, type of user. Adopting the principles of contract theory [7], each SBS aims at incentivizing the users to exhibit their improved transmission characteristics by providing them some transmission-related rewards. The goal is to determine an equilibrium point, where both the SBS and the users maximize their utilities, in terms of collecting and transmitting information, respectively. The type of the user is defined as \(t_u^c=(G_u^c\cdot \sum \{1/G_u^c\})^{-1/2}, t_u^c\in (0,1]\), and without loss of generality, we consider \(|U_c|\) types of users with \(t_1^c<\dots<t_u^c<\dots <t_{|\mathbb {U}_c|}^c\). Adopting the NOMA technique and by applying SIC at the receiver in the uplink, the users with worse channel gain conditions are alleviated by the interference caused by the users with better channel conditions, as the signals of the latter are decoded first at the receiver and excluded from the interference sensed by the users with worse channel conditions. Thus, the SBS c rewards the user u with \(r_u^c=\rho /I_u^c\), where \(\rho \in \mathbb {R}^+\) is the reward factor and \(I_u^c\) is the interference that user u senses, while being associated with SBS c. The physical meaning and interpretation of the reward is that the SBS provides a greater reward to users that experience less interference. Thus, a user of higher type \(t_u^c\) (i.e., worse channel conditions), senses less interference and it is rewarded more by the SBS (for fairness purposes), as it is expected to transmit with higher power (i.e., invest greater effort).

Figure 1 summarizes the overall operation of the proposed framework. At the beginning of each time slot, the users select an SBS to be associated with in a distributed manner following a reinforcement learning approach (Sect. 4). Then, the optimal contracts are determined by each SBS for the corresponding users residing within each small cell (Sect. 3). The overall nested procedure is executed over the time guaranteeing the smooth operation of the 5G SCNs.

Fig. 1.
figure 1

System model & operation framework.

3 Optimal Contract Design

The considered heterogeneous dense network is characterized by asymmetry of information, as the exact users’ types and transmission powers are unknown to each SBS. Instead each SBS c only knows the probability \(Pr_u^c\) that the user u is of type \(t_u^c\) and \(\sum \limits _{u=1}^{|\mathbb {U}_c|}Pr_u^c=1\). The utility that SBS c experiences from user u is defined as \(U_c^u=P_u^c-\mathcal {C}\cdot r_u^c\), expressing the SBS’s satisfaction in terms of its operation by receiving a signal with high power strength \(P_u^c\), while being charged with the cost \(\mathcal {C}\cdot r_u^c\) of providing incentives through the rewards to the user. Thus, the overall SBS’s utility is \(U_c=\sum \limits _ {u=1}^{|\mathbb {U}_c|}[Pr_u^c\cdot (P_u^c-\mathcal {C}\cdot r_u^c)]\). On the other hand, the user’s utility is defined as \(U_u^c(P_u^c)=t_u^c\cdot e( r_u^c)-\mathcal {C'}\cdot P_u^c\), which expresses the user’s satisfaction \(t_u^c\cdot e(r_u^c)\) from receiving the reward \(r_u^c\) from the SBS c, while considering its personal transmission cost \(\mathcal {C'}\cdot P_u^c\). It is noted that the user’s satisfaction depends on its type \(t_u^c\) and on the evaluation function \(e(r_u^c)\) (with \(e(0)=0, e'(r_u^c)>0, e''(r_u^c)<0\)), which captures the user’s perception and personal satisfaction from the reward.

Following the principles of contract theory, the SBS establishes a personalized contract \((r_u^c,P_u^c)\) with each user in the small cell, where the user invests its personal effort \(P_u^c\) (i.e., uplink transmission power) and the SBS rewards the user with \(r_u^c\). A contract is feasible if the following conditions hold true: (i) the Individual Rationality (IR), i.e., the contract should guarantee that the user’s utility is non-negative \(U_u^c(P_u^c)\ge 0, \forall u\in \mathbb {U}_c\); and (ii) the Incentive Compatibility (IC), i.e., each user must select that contract which is designed for its type \(t_u^c\cdot e( r_u^c)-\mathcal {C'}\cdot P_u^c\ge t_u^c\cdot e(r_{u'}^c)-\mathcal {C'}\cdot P_{u'}^c, \forall u,u'\in \mathbb {U}_c, u\ne u'\). Additionally, the following three conditions must hold true in order the contract to be feasible.

Proposition 1

For any feasible contract \((r_u^c,P_u^c)\), the following must hold true: \(r_u^c>r_{u'}^c \Longleftrightarrow t_u^c>t_{u'}^c\) and \(r_u^c=r_{u'}^c \Longleftrightarrow t_u^c=t_{u'}^c\).

Proposition 1 can be proven by arguing as follows. We can prove separately the sufficiency and the necessity of the described condition. Regarding the sufficiency, we can write the incentive compatibility condition for two types of users, e.g., \(t_u^c>t_{u'}^c\), add the inequalities, and exploit the strictly increasing property of the evaluation function \(e( r_u^c)\) to conclude that \(r_u^c>r_{u'}^c\). Regarding the necessity, we can work backwards by considering the strictly increasing property of \(e( r_u^c)\), build the summation of the incentive compatibility constraints for two users, and conclude that \(t_u^c>t_{u'}^c\). Similarly, we can show \(r_u^c=r_{u'}^c \Longleftrightarrow t_u^c=t_{u'}^c\).

Proposition 2

A user of higher type, i.e., \(t_1^c<\dots<t_u^c<\dots <t_{|\mathbb {U}_c|}^c\) (i.e., worse channel conditions), will receive a greater reward from the SBS c to be incentivized to be served by this SBS, i.e., \(r_1^c<\dots<r_u^c<\dots <r_{|\mathbb {U}_c|}^c\), and it will transmit with greater power (i.e., effort), i.e., \(0<P_1^c<\dots<P_u^c<\dots <P_{|\mathbb {U}_c|}^c\).

The proof of this proposition intuitively stems from Proposition 1, given that \(t_1^c<\dots<t_u^c<\dots <t_{|\mathbb {U}_c|}^c\).

Proposition 3

A user of higher type, i.e., \(t_1^c<\dots<t_u^c<\dots <t_{|\mathbb {U}_c|}^c\), receives higher utility to be incentivized by the SBS c, i.e., \(U_1^c<\dots<U_u^c<\dots <U_{|\mathbb {U}_c|}^c\).

Proposition 3 proof can be concluded based on the following steps. We consider the incentive compatibility constraint for one user, and we analyze the inequality by considering a user of lower type, i.e., \(t_u^c>t_{u'}^c\), and sequentially we conclude that \(U_u^c>U_{u'}^c\). Due to space limitations only the key arguments required for the proofs of Propositions 13 are provided here, while the detailed intermediate steps of the proofs are presented in [8, 9].

Each SBS aims at maximizing its overall utility in order to be able to collect and properly decode the users’ transmitted signals (as dictated by the received SINR). In parallel, each user should satisfy all of its personal constraints, as they have been described above, in order to be willing to be associated with the specific cell. Thus, the following distributed optimization problem is solved at each SBS, by jointly considering the SBS and corresponding users’ sides.

$$\begin{aligned} \mathbf{P1}:&\max \limits _{(r_u^c,P_u^c)_{\forall u\in \mathbb {U}_c}} U_c=\sum \limits _ {u=1}^{|\mathbb {U}_c|}[Pr_u^c\cdot (P_u^c-\mathcal {C}\cdot r_u^c)]\end{aligned}$$
(1a)
$$\begin{aligned}&\mathbf{s}.t. \; t_u^c\cdot e(r_u^c)-\mathcal {C'}\cdot P_u^c\ge 0, \forall u\in \mathbb {U}_c\end{aligned}$$
(1b)
$$\begin{aligned}&t_u^c\cdot e( r_u^c)-\mathcal {C'}\cdot P_u^c\ge t_u^c\cdot e( r_{u'}^c)-\mathcal {C'}\cdot P_{u'}^c, \forall u,u'\in \mathbb {U}_c, u\ne u'\end{aligned}$$
(1c)
$$\begin{aligned}&0\le r_1^c<\dots<r_u^c<\dots <r_{|\mathbb {U}_c|}^c \end{aligned}$$
(1d)

The optimization problem P1 is non-convex, thus, in order to solve it, we reduce its constraints. By performing appropriate derivations we can easily show that the constraints (1b) and (1c) can be reduced to (2b) and (2c) [8, 9]. In particular, to show that the constraint (1b) can be reduced to (2b), we consider the incentive compatibility constraint for a user and by considering the strictly increasing property of the evaluation function, as well as the monotonicity of the users’ types (proposition 2), we can sequentially rewrite the incentive compatibility constraint to be reduced to the inequality \(t_u^c\cdot e( r_u^c)-\mathcal {C'}\cdot P_u^c\ge t_1^c\cdot e( r_{1}^c)-\mathcal {C'}\cdot P_{1}^c\). Then, by exploiting the individual rationality condition, we can conclude to the constraint (2b). Moreover, in order to show that the constraint (1c) is reduced to (2c), we consider the downward (i.e., \(u,u', u'\in \{1,\dots ,u-1\}\)), the upward (i.e., \(u,u', u'\in \{u+1,\dots ,|\mathbb {U}_c|\}\)), and the local downward (i.e., \(u,u-1, \forall u,u-1\in \mathbb {U}_c\)) incentive compatibility constraints. Then, we can show that all the downward incentive compatibility constraints can be represented by the local downward incentive compatibility constraint, while all the upward incentive compatibility constraints can be equivalently captured by the local downward incentive compatibility constraints. Thus, the optimization problem P1 can be rewritten as:

$$\begin{aligned} \mathbf{P2}:&\max \limits _{(r_u^c,P_u^c)_{\forall u\in \mathbb {U}_c}} U_c=\sum \limits _ {u=1}^{|\mathbb {U}_c|}[Pr_u^c\cdot (P_u^c-\mathcal {C}\cdot r_u^c)]\end{aligned}$$
(2a)
$$\begin{aligned}&\mathbf{s}.t. \; t_1^c\cdot e(r_1^c)-\mathcal {C'}\cdot P_1^c = 0\end{aligned}$$
(2b)
$$\begin{aligned}&t_u^c\cdot e( r_u^c)-\mathcal {C'}\cdot P_u^c = t_u^c\cdot e( r_{u'}^c)-\mathcal {C'}\cdot P_{u'}^c, \forall u,u'\in \mathbb {U}_c, u\ne u'\end{aligned}$$
(2c)
$$\begin{aligned}&0\le r_1^c<\dots<r_u^c<\dots <r_{|\mathbb {U}_c|}^c \end{aligned}$$
(2d)

We can easily prove that P2 is a convex programming problem by checking the Hessian matrix. Thus, P2 can be solved by applying the KKT conditions, and accordingly the optimal users’ transmission power vector \(\mathbf {P_c^*}=[P_1^{c*},\dots ,P_{|\mathbb {U}_c|}^{c*}]\) and the optimal SBS’s rewards vector \(\mathbf {r_c^*}=[r_1^{c*},\dots ,r_{|\mathbb {U}_c|}^{c*}]\), can be determined.

4 Users’ Association Based on Reinforcement Learning

In order to support the operation of the aforementioned framework, users are enabled to autonomously select the specific cell to be associated with, based on a reinforcement learning (RL)-based approach. In this work, we adopted the reinforcement learning mechanism of the stochastic learning automata (SLA) to realize this process in a distributed manner, though other alternatives may be considered as well (e.g., Max-logit, Binary-logit) [10]. In particular, each user acts as a learning agent, where at each iteration of the SLA algorithm makes a probabilistic-based selection of an SBS to be associated with, following a simple probabilistic rule. To realize this, each cell is characterized by a reputation function \(REP_c\) depending on socio-physical characteristics, e.g., average users’ distance from an SBS, average users’ transmission power invested to communicate with an SBS, average users’ received rewards, etc., similarly to [11]. Accordingly, the cell’s reputation functions are incorporated within the RL algorithms to enable the users to probabilistically learn their best cell association.

5 Numerical Results

In this section, we present some numerical results that validate the operation and performance of the contract-theoretic component of our proposed framework, obtained through modeling and simulation. Given that adjacent cells do not interfere with each other, we focus on the operation of one indicative cell, assuming that users have already performed their association with each SBS. In the following, we consider a small cell of 500-meter radius with an SBS placed in the center of the cell and \(|U_c|=10\) users placed in increased distance from the SBS (with a step of 50-meters). Specifically, we define one type for each user as analyzed in Sect. 2, presuming that all users have the same probability \(Pr^c_u\) of belonging to each type. As mentioned earlier, the channel gain of user u communicating with the SBS c is modeled as \(G_u^c=k_u/(d_u^c)^a\), where \(k_u\) is assumed to be a lognormal distributed random variable with mean 0 and variance \(\sigma ^2=8dB\), and the corresponding path loss exponent is \(a=4\). Furthermore, the maximum uplink transmission power of the users is \(P^c_{u,max}=0.7\) [W], while the reward factor is \(\rho =10^{-15}\), the SBS’s unit cost is \(\mathcal {C}=0.7\), the user’s unit power cost is \(\mathcal {C'}=0.4\), and the user’s evaluation function is assumed \(e(r_u)=\sqrt{r_u}\).

Figure 2a presents the users’ effort, i.e. optimal uplink transmission power, and reward vs. their index, while similarly Fig. 2b shows the achievable utilities for both the users and the cell. Logarithmic scale is used to better visualize the curves’ trend and differences in the values. The results in both Fig. 2a and 2b, validate the monotonicity behavior in the offered contracts. That is the higher the user type, the more effort is required and thus, the more the reward it receives, leading to larger utility for the user itself and the cell. This is well aligned with the fact that the measured interference at the receiver after performing the SIC technique for the user with the lowest channel gain (i.e., the highest type user) is impacted only by the background noise, which is very low. Moreover, from Fig. 2b it can be seen that all types of users receive a non-negative utility, being consistent with the individual rationality constraint imposed. To complement the evaluation of the contract feasibility, the utilities of two selected users (i.e. users with type 5 and 8) are examined in Fig. 2c, for different possible contracts offered by the SBS (represented in the horizontal axis by user types). Observing these results we note that following the proposed approach, each user achieves equal or higher utility from the other types, if and only if selects the contract item intended for its own type (dictated by the red dashed vertical line in the graphs), demonstrating the satisfaction of the incentive compatibility constraint.

Fig. 2.
figure 2

Operation validation of the proposed contract-theoretic framework.

6 Conclusions and Future Work

In this paper, the problem of resource orchestration, in terms of user to small cell association and power allocation, in the uplink of 5G NOMA-based small cell networks is studied. A reinforcement learning technique is adopted to enable the users to select the optimal small cell to be connected with, in an autonomous and distributed manner. Thereafter, a contract-theoretic approach is introduced to design user specific contracts in terms of determining the users’ optimal uplink transmission power and the small cell’s optimal rewards provided to the users, to incentivize them to perform in an interference limited manner in the network.

Our current and future work contains the extension of this framework by considering multiple reinforcement learning approaches for the user association and comparing them in terms of efficiency and computation complexity. Finally, a natural extension of this work focuses on the use of the proposed contract-theoretic framework for the end-to-end study of the network operation, by including the backhauling communication of the SBSs to the macro base station.