1 Introduction

The continuous growth of mobile data services has been characterized by a significant increase in the number of mobile devices and connections reaching 7.9 billion in 2015, and by the high penetration of smart devices that accounts for 89% of the global mobile data traffic [1]. In 2021, the mobile subscribers are expected to reach 10 billion the mobile data traffic in 2016 increased approximately by 60% compared to 2015 [2]. Within such a blooming market it is expected that the competition among Wireless Internet Service Providers (WISPs) covering similar geographical areas and/or groups of users will only intensify, providing to their potential customers a multitude of options, ranging from different QoS offerings, to different resource allocation schemes and pricing policies.

Within such a competitive communications paradigm, end users, empowered by the increased computational and communications intelligence of their smart devices, will be facing several flexibilities and challenges in dynamically selecting the most appropriate WISP to be served by, in accordance with the aforementioned options and offerings (e.g. pricing policy, WISP reputation, etc.), while properly controlling their limited power resources in order to meet their service quality requirements.

In recent years, several efforts have been devoted to studying the problem of resources’ pricing, mainly aiming at increasing WISP’s revenue and profit. Two main categories of pricing policies have been reported in the literature, i.e. static and dynamic. Static pricing policies are classified as flat-rate [3], usage-based pricing [4], tiered pricing [5], QoS-based pricing where multiple traffic classes with different support of users’ QoS prerequisites are created and charged with different static prices [6], as well as negotiated contracts [7]. Though such policies present simplicity in their implementation, their main drawback is that they do not enable the WISPs to properly adapt to real-time users’ needs and adjust their prices accordingly, therefore resulting either in low user satisfaction or poor utilization of available resources.

To overcome this drawback various dynamic pricing policies have been proposed, such as: a) raffle-based pricing, promoting the shift of user demand to off-peak periods [8], b) auction-based pricing [9], c) day-ahead pricing [10] and d) congestion-adaptable pricing, where the WISP monitors the network and adapts the announced prices based on the observed congestion in a real time manner [11]. Nevertheless, these approaches do not consider the problem of optimal pricing as most of them are either heuristic or empirical, while more importantly they a) underestimate the need to jointly consider pricing and resource allocation for supporting multiple types of services in a real time varying environment and b) ignore the multi-provider nature of the arising wireless competitive market.

Similarly, extensive research efforts have been devoted to the problem of power allocation in interference limited environments adopting various multiple access techniques, such as code division multiple access (CDMA) and non-orthogonal multiple access (NOMA). In 5G wireless networks, NOMA technique has been adopted to provide improved spectral efficiency, massive connectivity and low transmission latency and signaling cost. In [12], the authors propose two power allocation frameworks, based on channel state information experienced by NOMA users and on pre-defined Quality of Service (QoS) per user respectively. In [13], a power allocation framework is proposed while considering proportional fairness objective. The authors in [14] study the optimization multi-user power and channel allocation problem in NOMA systems via combining Lagrangian duality and dynamic programming in order to conclude to a competitive suboptimal solution. However, all these efforts have been treated in isolation from the critical and interrelated resource pricing problem especially under the emerging NOMA 5G wireless networks.

1.1 Paper contributions and outline

Our paper aims exactly at filling the aforementioned gaps and address directly the combined problem of WISP selection by the mobile customers and corresponding power allocation. Each WISP is characterized by a price and service-based reputation, formed based on its adopted pricing policy and its success to satisfy customers’ QoS prerequisites, the latter implicitly characterizing the specific WISP’s market penetration factor. Multiple WISPs are engaged in a competitive process towards improving their market penetration factor and attracting increased number of customers via offering different dynamic pricing policies to their candidate customers. As such they are looking for new and smart ways of sharpening their service offerings and of strengthening their role in the service-provider ecosystem.

Specifically, the basic contributions of our proposed approach and framework in this paper are summarized as follows:

  1. 1.

    A machine learning mechanism is adopted by the customers who act as learning automata and via the learning process select the most appropriate WISP to be served by. The iterative aspect of machine learning is important in such a dynamic environment because as models are exposed to new data, they are able to independently adapt. Considering a time-slotted system the machine learning based WISP selection algorithm determines at the beginning of each time slot the user’s provider choice, while after this process is completed, a power allocation framework is realized towards determining users’ optimal transmission powers within each WISP.

  2. 2.

    We introduce a holistic approach of utility-based uplink transmission power allocation in multi-service wireless communication networks. Each mobile customer is associated with an appropriately designed utility function representing his degree of satisfaction with respect to the allocated system’s resources and depending on the imposed pricing policy of the selected WISP. The form of the considered utility function is a generic one that closely models users’ Quality of Service (QoS) prerequisites and the imposed pricing policy by the WISP.

  3. 3.

    Based on the non-cooperative nature of the resource allocation problem, a maximization problem of each user’s utility function is formulated and confronted as a non-cooperative game. To deal with this problem while simultaneously treat the adoption of multiple utility functions required to support the multi-service QoS prerequisites and consider the various imposed pricing policies, we follow an approach based on the quasi-concavity of mobile customers’ utility functions in order to conclude to a unique Nash equilibrium point.

  4. 4.

    A distributed and iterative algorithm is proposed towards determining the optimal power allocation. The output of the resource allocation problem feeds the learning system in order to build knowledge and conclude to the optimal provider selection, even if dynamically changes apply to the system as it evolves. A two-stage iterative algorithm is proposed in order to realize the machine learning provider selection and the distributed resource allocation.

  5. 5.

    The proposed framework enables the autonomic user-centric management and optimization. The joint dynamic provider selection and power resource management in competitive wireless communication markets supports self-* properties, e.g., self-optimization, self-adaptation, etc. and these properties enable the users themselves to conclude to their best response strategies.

  6. 6.

    Detailed numerical results are provided that demonstrate the performance and operational effectiveness and efficiency of the proposed framework, along with its flexibility and adaptability under various scenarios. Finally, the performance of the proposed framework is compared against other related state of the art approaches (i.e. [16, 20]) and its superiority in terms of achieved energy efficiency is demonstrated.

The outline of the paper is as follows. In Section 2 various considered models and assumptions at different operation levels are introduced, including the system model (Section 2.1), the different pricing policies (section 2.2), the adopted utility functions (Section 2.3) and the WISP market penetration model (Section 2.4). Then in Section 3 the overall joint provider selection and resource management problem in the competitive wireless market environment is treated based on the above models and assumptions. Specifically, the overall proposed framework design is introduced in Section 3.1, while in Section 3.2 the provider selection problem is formulated and solved via a machine learning based approach, and in Section 3.3 the corresponding resource management problem is addressed via a game theoretic framework. In Section 4, a distributed two-step algorithm is proposed towards implementing the provider selection process and determining the optimal power allocation, while Section 5 contains the performance evaluation of the overall framework. Finally, Section 6 concludes the paper.

2 Models and preliminaries

2.1 System model

A single cell non-orthogonal multiple access (NOMA) wireless network is considered, while its uplink communication is studied. Within the cell, we assume that J WISPs provide multiple types of services to the mobile customers and have full coverage of the cell. Let us denote by \( \mathrm{J}=\left\{1,2,3\right\} \) the set of WISPs, where we have assumed three different WISPs and each one adopts one differentiated pricing policy, i.e., homogeneous, topology-based and QoS-based, as they will be discussed in detail in Section 2.2. Let us denote by \( \mathrm{N}=\left\{1,\dots, \mathrm{i},\dots, \mathrm{N}\right\} \) the set of mobile customers, which consists of the mobile customers requesting real and non-real time services and their sets are \( {\mathrm{N}}_{\mathrm{RT}} \) and \( {\mathrm{N}}_{\mathrm{N}\mathrm{RT}} \), respectively. Thus we have \( \mathrm{N}={\mathrm{N}}_{\mathrm{RT}}\cup {\mathrm{N}}_{\mathrm{N}\mathrm{RT}} \). The mobile customers requesting non-real time services are characterized by delay-tolerant data demand (e.g., video uploading, email, etc.) while those requesting real-time services impose strict short term QoS constraints (e.g., online gaming, voice, video conference, etc.)

Mobile customer’s uplink transmission power is denoted by p i and due to his physical and hardware limitations it is upper and lower bounded, i.e., \( {p}_i^{Min}\le {p}_i\le {p}_i^{Max} \). The channel gain of each mobile customer \( i, i\in \mathrm{N} \) to the base station is denoted by G i . Without loss of generality, the channel gains are sorted as G N  ≤  …  ≤ G i  ≤  …  ≤ G 1. Based on the NOMA scheme, the Successive Interference Cancellation (SIC) technique is performed at each mobile customer. Therefore, the signal of the mobile customers with the best channel gain is demodulated first, thus the mobile customers with worse channel gain are able to remove the demodulated signals based on SIC technique and sense less interference.

Moreover, let W[Hz] be the system’s spreading bandwidth, p‐i denote the transmission power vector of the rest of the mobile customers in the cell and I 0[W] be the Additive White Gaussian Noise (AWGN) power representing the background noise. The fundamental performance measure that verifies the mobile customers’ QoS prerequisites’ satisfaction is the signal-to-interference-plus-noise ratio (SINR). The received SINR of mobile customer \( i, i\in \mathrm{N}={\mathrm{N}}_{\mathrm{RT}}\cup {\mathrm{N}}_{\mathrm{N}\mathrm{RT}} \) at the base station can be written as [14]:

$$ {\gamma}_i\left({p}_i,{\mathrm{p}}_{\hbox{-} \mathrm{i}}\right)=\frac{W}{R_{ser}}\frac{G_i{p}_i}{\sum_{k= i+1}^N{G}_k{p}_k+{I}_0} $$
(1)

where R ser denotes mobile customer’s fixed transmission, while its specific value depends on the requested service by the mobile customer and \( \frac{W}{R_{ser}} \) denotes the processing gain.

2.2 Pricing model and policies

In the following we assume that every WISP dynamically imposes a pricing policy (e.g., homogeneous, topology-based or QoS-based) to the mobile customers that are being served by it, considering the system’s resources that they consume (i.e., transmission power) and the interference that their transmissions cause to the overall network. The imposed pricing policy is represented by the pricing factor c i , which is upper and lower bounded, i.e., \( {c}_i^{Min}\le {c}_i\le {c}_i^{Max} \), due to wireless market existing regulations. It is noted that in this paper the pricing factor is assumed dimensionless, however it can be easily mapped and normalized to monetary units for further market analysis, which however is not the scope of the paper. For practical purposes each WISP dynamically organizes mobile customers in classes based on different criteria according to the adopted pricing policy and the customers belonging to the same class are homogeneously addressed by the WISP that eventually announces a common price to the users of the same class per time slot.

In this paper, we consider and study three different pricing policies, as follows:

  • Homogeneous pricing policy. The WISP announces per time slot a common and universal usage-based price to all mobile customers residing in its coverage area without discriminating them, as presented in Fig. 1a.

  • Topology-based pricing policy. In this approach, the WISP considers mobile customers’ distance from the base station (BS) as the basic criterion towards penalizing them for their resource consumption and the interference that they cause to the rest of the users in the network. Thus, the WISP organizes the users in different zones with relevance to their distance from the BS and a common optimal price is determined for each zone, as shown in Fig. 1b. Such a discrimination is justified and motivated by the argumentation that channel conditions are worse for the more distant customers compared to the less distant ones, and thus these users tend to reach higher uplink transmission power levels in order to achieve a satisfactory QoS level. Consequently, their battery life is exhausted faster while higher levels of interference are imposed to the rest of the customers.

Fig. 1
figure 1

Pricing policies in competitive wireless communication markets

  • QoS-based pricing policy. Users in emerging wireless networks have the ability to request different services with various QoS characteristics and competitive spectrum consumption needs. Based on their QoS requirements, the users appropriately adjust their uplink transmission power in a selfish manner, thus inducing interference within the cell. Therefore, an appropriate pricing policy should impose a more social behavior driving them towards sparingly consuming the overall system’s resources. Under this observation, in QoS-based pricing scheme the WISP organizes mobile customers in pricing classes based on the type of their requested services and their corresponding QoS prerequisites. In this paper, we have considered two service classes consisting of mobile customers that request real (RT) and non-real time (NRT) services, as shown in Fig.1c. It should be clarified that though we consider two service classes for demonstration purposes the analysis can be easily extended to multiple classes.

2.3 Utility functions

The concept of utility function has been adopted from the field of economics in order to represent mobile customers’ perceived satisfaction from: (i) the imposed pricing policy by the WISP, (ii) the resource allocation and (iii) the fulfillment of their QoS prerequisites. A net utility function is adopted by each mobile customer, which consists of two parts: (a) the pure utility function and (b) the pricing function [15]. For both types of requested services considered (real time and non-real time), mobile customer’s pure perceived satisfaction is reflected via the ratio of the achievable data rate to the corresponding consumed uplink transmission power. Thus, mobile customer’s satisfaction increases if he achieves high data rate and low transmission power. The latter results in extending mobile customer’s battery life and causing decreased interference to the rest of the mobile customers.

Specifically, for the mobile customers requesting real time services (i.e., RT mobile customers), the satisfaction from the achieved data rate is formulated as a sigmoidal-like function, i.e., R ser  ⋅ f i (γ i ), with respect to the SINR γ i . The efficiency function f i (γ i ) represents the successfully transmitted bits from the mobile customer to the base station and it is a continuous strictly increasing and sigmoidal function with respect to the SINR γ i . For analysis purposes, a common efficiency function proposed in the literature [15, 17, 20] is adopted, i.e., \( f\left({\gamma}_i\right)={\left(1-{e}^{- A{\gamma}_i}\right)}^M \), where A, M are positive parameters. Each RT mobile customer is characterized by a target SINR value, i.e., \( {\gamma}_i^{t \arg et} \). If the target SINR value is achieved, then mobile customer’s QoS prerequisites are fulfilled. Thus, the target SINR value is mapped to the inflection point of the sigmoidal function R ser  ⋅ f i (γ i ). The notion of the latter formulation is that if mobile customer’s achieved rate is lower than \( {R}_{ser}\cdot {f}_i\left({\gamma}_i^{t \arg et}\right) \), then his perceived satisfaction decreases rapidly acting as an alert to the system that more system’s resources should be allocated to the specific customer. In contrast, if the achieved data rate is greater than \( {R}_{ser}\cdot {f}_i\left({\gamma}_i^{t \arg et}\right) \), then mobile customer’s satisfaction increases slightly, due to the fact that his QoS expectations have already been fulfilled. On the other hand, the perceived satisfaction of mobile customers requesting non-real time services (i.e., NRT mobile customers) with respect to the achieved data rate is formulated as a logarithmic function with respect to the efficiency function f i (γ i ), i.e., R ser  ⋅ log(1 + f i (γ i )). The latter formulation is adopted in order to capture NRT mobile customers’ greedy behavior to achieve even increased data rate. Summarizing the above analysis, mobile customer’s net utility function can be expressed as follows:

$$ {U}_i^{NET}\left({\mathrm{p}}_i,{\mathrm{p}}_{\hbox{-} \mathrm{i}},{\mathrm{c}}_i\right)=\left\{\begin{array}{l}\kern1em \frac{R_{ser}\cdot {f}_i\left({\gamma}_i\right)}{p_i}-{\mathrm{c}}_i\cdot {e}^{p_i},\kern0.5em i\in {\mathrm{N}}_{\mathrm{RT}}\\ {}\frac{R_{ser}\cdot \log \left(1+{f}_i\left({\gamma}_i\right)\right)}{p_i}-{\mathrm{c}}_i\cdot {e}^{p_i},\kern0.5em i\in {\mathrm{N}}_{\mathrm{N}\mathrm{RT}}\end{array}\right. $$
(2)

The pricing function adopted in mobile customer’s net utility function (2) is a convex function with respect to mobile customer’s uplink transmission power p i . This formulation has been sophisticatedly selected in order fairness to be achieved among mobile customers with respect to price. More specifically, mobile customers that transmit with high power are more penalized compared to the mobile customers that transmit with low power due to the increased interference that they cause to the overall wireless network [15, 16].

2.4 WISP market penetration model

Mobile customer’s decision on WISP selection is affected by its imposed pricing policy, which is expressed via the pricing factor c i included in customer’s utility function (2), as well as by WISP’s penetration and competitiveness in the wireless communication market. The competitiveness of a WISP \( j,\mathrm{j}\in \mathrm{J} \) is expressed via its penetration to the wireless communication market, which is translated to the total customers’ achieved data rate over the total achieved data rate in the examined wireless network, i.e., \( \frac{\sum_{\begin{array}{l} i=1\\ {} j\end{array}}^N{R}_{ach, i}}{\sum_{\begin{array}{l} i=1\\ {}\forall j\in \mathrm{J}\end{array}}^N{R}_{ach, i}} \), where R ach , i  = R ser  ⋅ f i (γ i ) for the RT mobile customers and R ach , i  = R ser  ⋅ log(1 + f i (γ i )) for the NRT customers.

3 Joint provider selection & resource management problem

3.1 Framework design

As mentioned before in this paper the problem of joint PROvider Selection and REsource MAnagement (PROSREMA) in competitive wireless communication markets is addressed. Every time slot (or for practical purposes for every window of certain number of timeslots), each mobile customer selects the WISP that will be served from based on a machine learning framework. The mobile customers act as learning automata gaining knowledge and experience from their past actions. They are able to intelligently sense the environment, while keeping history of their decisions in order to make more advantageous actions in the future, as time evolves. The necessary information in order to take their decision is their transmission powersp and the corresponding perceived utilities UNET at the previous time slot t. Furthermore, apart from these parameters, the users consider also the competitiveness of each WISP j, i.e., reward function r j (t) (as it will be defined and discussed in detail in Section 3.2) in order to conclude to their final decision and action a(t). Given mobile customers’ actions in terms of WISP selection, a distributed non-cooperative power control game among them is performed every time slot in order to determine their optimal uplink transmission powers and their corresponding utilities. Therefore, an overall cycle of dynamic provider selection and power resource management is realized. The above described overall procedure is performed iteratively in the time, as presented in Fig. 2, and it is explained in detail in the following sections 3.2 and 3.3.

Fig. 2
figure 2

Wireless competitive communication market as a learning system

3.2 Provider selection based on machine learning

The overall wireless network consisting of J WISPs and N mobile customers can be studied as a learning system, where customers act as learning automata that react with the environment in order to decide which WISP to select in order to be served.

Fig.2 presents the examined wireless network as a learning system and the relationship among the learning automata and their environment. Each mobile customer / learning automaton \( i, i\in \mathrm{N} \) at each operation time slot has a set of action a(t) = {a 1, …,a J }, which represents the different choices of WISP selection, wherefrom he will be served. Towards making their decision, the mobile customers consider the output set β(t) = {UNET(t), p(t)} of their environment, where UNET(t) and p(t) are the vectors of utility and power consumption of all customers. The output β(t) = {UNET(t), p(t)} is determined via performing the resource management, which will be analyzed in the next subsection. The solution of the resource management problem will be the mobile customers’ optimal uplink transmission power. Based on the mobile customers’ chosen actions and the corresponding reaction of the environment, we are able to determine the reward probability r j (t), which is associated with action a j . The reward probability represents the competitiveness of the jth WISP, i.e., \( {r}_j(t)=\frac{\sum_{\begin{array}{l} i=1\\ {} j\end{array}}^N{R}_{ach, i}}{\sum_{\begin{array}{l} i=1\\ {}\forall j\in \mathrm{J}\end{array}}^N{R}_{ach, i}} \) The action probability vector of mobile customer / learning automaton \( i, i\in \mathrm{N} \) is Pri(t) = {Pr i , 1(t),  … , Pr i , J(t)}, where Pr i , j(t) expresses the probability of selecting the jth WISP and following the model of learning automata, and it is updated as follows:

$$ { \Pr}_{i, j}\left( t+1\right)={ \Pr}_{i, j}(t)- b\cdot {r}_j(t)\cdot { \Pr}_{i, j}(t),\kern0.75em {j}^{\left( t+1\right)}\ne {j}^{(t)} $$
(3a)
$$ { \Pr}_{i,\mathrm{j}}\left( t+1\right)={ \Pr}_{i,\mathrm{j}}(t)+ b\cdot {r}_j(t)\cdot \left(1-{ \Pr}_{i,\mathrm{j}}(t)\right),{j}^{\left( t+1\right)}={j}^{(t)} $$
(3b)

where b , 0 < b < 1 is a step size parameter that controls the convergence time of the learning process. The impact of this parameter on the convergence time of the algorithm is numerically studied later in Section 5.4. Eq. (3a) represents mobile customer’s action probability to select a different WISP j (t + 1) compared to the one in time slot t, i.e., j (t), while eq. (3b) reflects the probability of continuing the customer to be served by the same WISP, i.e.,  j (t + 1) = j (t).

Considering the initialization point of WISP selection, unless some more “educated” and/or historical information is available, we assume that the overall wireless network has no prior knowledge of the reward probability r j (t), 0 ≤ r j (t) ≤ 1 and the action probability Pri(t) = {Pr i , 1(t),  … , Pr i , J(t)}. Therefore, the initial selection of WISP by the mobile customers could be made with equal probability, i.e., \( { \Pr}_{i,\mathrm{j}}\left( t=0\right)=\frac{1}{J} \). Finally, it is noted that the mobile customers converge to the most cost-efficient and service trustworthy WISP in a long-term period. The description of the provider selection machine learning algorithm is presented as part of the overall PROSPERA algorithm in Section 4.

3.3 Power management

Given the WISP selection, each customer aims at determining his optimal uplink transmission power \( {p}_i^{\ast } \) in order to maximize his perceived satisfaction, as expressed in eq. (2). Therefore, the aforementioned mobile customers’ goal is formulated as a distributed maximization problem of each customer’s utility function with respect to his uplink transmission power as follows:

$$ \begin{array}{c}\hfill \underset{p_i}{ \max }{U}_i^{NET}\left({\mathrm{p}}_i,{\mathbf{p}}_{\hbox{-} \mathbf{i}},{\mathrm{c}}_i\right),\hfill \\ {}\hfill s. t.\kern1.25em {p}_i^{Min}\le {p}_i\le {p}_i^{Max}\hfill \end{array}\forall i\in \mathrm{N} $$
(4)

where \( {p}_i\in {\mathrm{P}}_{\mathrm{i}} \).

Considering the distributed nature of the optimization problem (4) and mobile customers’ selfish behavior, a game theoretic approach is adopted towards determining mobile customers’ optimal transmission power vector \( {\mathrm{p}}^{\ast }=\left\{{p}_1^{\ast },{p}_2^{\ast },\dots, {p}_i^{\ast },\dots, {p}_N^{\ast}\right\} \). Let us denote by \( G=\left\{\mathrm{N},{\mathrm{P}}_{\mathrm{i}},{U}_i^{NET}\right\} \) the non-cooperative power allocation game, where N is the set of players, i.e., mobile customers, \( {\mathrm{P}}_{\mathrm{i}} \) is the strategy space of the ith customer and \( {U}_i^{NET} \) its corresponding utility. The concept of Nash equilibrium is adopted towards seeking analytically the solution of the non-cooperative power allocation game. Towards proving the existence and uniqueness of Nash equilibrium in the non-cooperative power allocation game, we should prove that mobile customer’s utility function \( {U}_i^{NET} \) is quasi-concave with respect to p i [17].

Definition 1: A function \( {U}_i^{NET} \) is strictly quasi-concave, if for any pair of distinct points p i and p i in the convex domain \( {\mathrm{P}}_{\mathrm{i}} \) and for 0 < λ < 1:

$$ {U}_i^{NET}\left({{\mathrm{p}}_i}^{\prime}\right)>{U}_i^{NET}\left({\mathrm{p}}_i\right)\Rightarrow {U}_i^{NET}\left(\lambda \cdot {\mathrm{p}}_i+\left(1-\lambda \right)\cdot {{\mathrm{p}}_i}^{\prime}\right)>{U}_i^{NET}\left({\mathrm{p}}_i\right) $$

Based on Definition 1, any concave function is quasi-concave.

Theorem 1: Mobile customer’s \( i, i\in \mathrm{N} \) utility function is quasi-concave in the modified strategy space \( {{\mathrm{P}}_{\mathrm{i}}}^{\prime } \) corresponding to the SINR interval \( {\gamma}_{\iota}\in \left(\frac{ \ln M}{A},{\gamma}_{final}\right) \), where γ final  = min {γ RT , γ NRT } and \( {\gamma}_{RT},{\gamma}_{\mathrm{N} RT}\in \left(\frac{ \ln M}{A},\frac{ \ln M}{A}\right) \), thus the Nash equilibrium of the non-cooperative power allocation game \( G=\left\{\mathrm{N},{\mathrm{P}}_{\mathrm{i}},{U}_i^{NET}\right\} \) exists and it is unique in the corresponding strategy space.

Proof: See Appendix A.

By definition, the Nash equilibrium of the non-cooperative power allocation game has to satisfy,

$$ {p}_i^{\ast }={BR}_i\left(\mathbf{p}\right)\underset{p_i\in {\mathrm{P}}_{\mathrm{i}}}{= argmax}{U}_i^{NET}\left({p}_i,{\mathbf{p}}_{-\mathrm{i}},{c}_i\right) $$
(5)

where BR i (p) denotes the best response function of each mobile customer \( i, i\in \mathrm{N} \). The mobile customers adopt the best response dynamics and the game \( G=\left\{\mathrm{N},{\mathrm{P}}_{\mathrm{i}},{U}_i^{NET}\right\} \) converges to its Nash equilibrium, i.e., power vector \( {\mathrm{p}}^{\ast }=\left\{{p}_1^{\ast },\dots, {p}_i^{\ast },\dots, {p}_N^{\ast}\right\} \), if mobile customer’s best response function is standard [18].

A function is characterized as standard if it satisfies the properties of: (a) positivity, (b) monotonicity, and (c) scalability for all p ≥ 0, where p = {p 1,  … , p i ,  … , p N }. These properties can be easily verified for BR(p) in the non-cooperative power allocation game since,

  1. a.

    Positivity: p > 0, thus BR(p) > 0;

  2. b.

    Monotonicity: if p  ≥ p then via eq. (5) we conclude that BR(p ) ≥ BR(p);

  3. c.

    Scalability: for all μ > 1, then via eq. (5) we conclude that λ ⋅ BR(p) ≥ BR(λ ⋅ p).

Thus, the global convergence to the non-cooperative power allocation game’s Nash equilibrium under the proposed best response function, given by eq. (5), is guaranteed.

4 PROSREMA algorithm

In this section, we propose a two-step algorithm towards implementing the provider selection and determining the optimal power resource allocation processes, as described above. The first part of the algorithm is based on the machine learning framework and it is responsible to determine the provider that each user selects to be served from. It is noted that the provider selection algorithm runs once at the beginning of each time slot or alternatively in a window of time slots if the networking environment conditions do not change rapidly. On the other hand, the second part of the proposed two-step algorithm is responsible to determine the optimal power allocation in a distributed manner. The resource allocation part of the two-step algorithm runs every time slot and it needs several iterations in order to converge.

PROSREMA algorithm

  1. Step 1

    (Initialization): At the beginning of the first time slot, i.e., t = 0, set the initial provider selection probability vector Pri(t = 0) as \( { \Pr}_{i,\mathrm{j}}\left( t=0\right)=\frac{1}{J} \), \( \forall i\in \mathrm{N} \). Afterwards, each mobile customer chooses a provider according to his provider selection probability vector Pri(t = 0).

  2. Step 2

    (Provider Selection): At every time slot t > 0, each mobile customer chooses a provider to be served from, according to his provider selection probability vector Pri(t) provided in relations (3a) and (3b).

  3. Step 3

    (Resource Allocation): Given that all customers have chosen a provider and the providers announce via broadcasting their imposed pricing, i.e., c i , then:

  4. Step 3a:

    Set ite = 0, where ite denotes the iteration of the resource allocation part of the algorithm. The base station broadcasts the overall interference in the network and each mobile customer determines his sensed interference.

  5. Step 3b:

    Each mobile customer determines his optimal uplink transmission power in accordance to (5).

  6. Step 3:

    If \( \left|{p}_i^{\left( ite+1\right)}-{p}_i^{(ite)}\right|\le \varepsilon \) (ε: small positive constant), the powers have converged and stop. Otherwise, return to step 3a.

  7. Step 4:

    (Provider Selection): Given the optimal power allocation, each provider can measure his competitiveness, which is the reward probability \( {r}_j(t)=\frac{\sum_{\begin{array}{l} i=1\\ {} j\end{array}}^N{R}_{ach, i}}{\sum_{\begin{array}{l} i=1\\ {}\forall j\in \mathrm{J}\end{array}}^N{R}_{ach, i}} \) and broadcasts its value to the customers.

  8. Step 5:

    (Provider Selection): Each mobile customer updates his provider selection probability vector via the following rule, where 0 < b < 1 is a step size parameter:

$$ { \Pr}_{i, j}\left( t+1\right)={ \Pr}_{i, j}(t)- b\cdot {r}_j(t)\cdot { \Pr}_{i, j}(t),\kern0.75em {j}^{\left( t+1\right)}\ne {j}^{(t)} $$
$$ { \Pr}_{i,\mathrm{j}}\left( t+1\right)={ \Pr}_{i,\mathrm{j}}(t)+ b\cdot {r}_j(t)\cdot \left(1-{ \Pr}_{i,\mathrm{j}}(t)\right),{j}^{\left( t+1\right)}={j}^{(t)} $$

It should be clarified that PROSREMA algorithm can be characterized as a low complexity algorithm, due to the simplicity of the calculations (i.e., closed forms) that it performs. Also, as it will be shown in detail in the section of numerical results, if the wireless communication conditions do not change rapidly, the provider selection probabilities converge fast in terms of necessary time-slots, i.e., there exists a probability Pr i , j (t) which is larger than a value approaching one (e.g., 0.999).

5 Performance evaluation

In this section we study the performance and operational characteristics of our proposed framework via modeling and simulation. Specifically, in Section 5.1 the simulation framework and corresponding assumptions are presented, while in Section 5.2 the various scenarios under consideration in this performance evaluation study are described. Following, in Section 5.3 we concentrate on the operation and performance of the proposed framework under a series of different simulation scenarios, while in Section 5.4 we study the convergence of the PROSREMA algorithm. Finally, in Section 5.5 we conduct a comparative study of this work against two other approaches in the literature, where pricing policies have been utilized towards improving resource allocation schemes in wireless networks, demonstrating the superiority of the proposed PROSREMA approach in terms of energy efficiency.

5.1 Simulation assumptions

For demonstration purposes, we consider the uplink of a NOMA single cell with N = 30 continuously backlogged users, placed within a radius range of R0 = 1000 m. Users request Real Time (RT) or Non-Real Time (NRT) services, assuming target rates of 128 kbps for the first, and feasible rates of more than 360 kbps for the latter. Users, depending on their requested service type, are alternatively placed within the topology with increasing distance from the base station coordinates. For both user classes, each user’s physical constraint towards transmission power is set at \( {p}_i^{Max} \) =0.2 W, while the user’s path gain is modeled as \( {G}_i=\raisebox{1ex}{${K}_i$}\!\left/ \!\raisebox{-1ex}{${d}_i^n$}\right. \)where Ki represents the shadow effect being a log normal distributed random variable with mean 0 and variance of σ2 = 8 dB, di denotes the distance of the user from the respective base station that he is linked to, whereas n refers to the path loss exponent (n = 4). Additionally, for demonstration purposes we adopted the following efficiency function\( f\left({\gamma}_i\right)={\left(1-{e}^{-3.7{\gamma}_i}\right)}^{80} \), while the results were attained by adjusting the step size parameter 0 < b < 1 for various values.

In the following we consider the co-existence of three different WISPs following respectively the three different pricing policies introduced in Section 2.2. For simplicity and fairness considerations we assume that all three WISPs announce their pricing policies through the base station which is positioned in the same coordinates at the center of the network (0,0). In the rest of the paper, we denote as WISP 1, 2 and 3, the WISP that announces the homogeneous, QoS-based and topology-based pricing policy, respectively.

5.2 Evaluation scenarios

As an appropriate initialization and reference point, we assume that initially each WISP supports 10 users (or 33.33% of user share), thus initially the users are equivalently allocated among the available WISPs. Hence, via the PROSREMA algorithm, the users, based on their degree of satisfaction from their transmission and service experience, have the potential at the end of each timeslot to either stay with the same WISP or switch to another WISP in order to further maximize their utilities and better exploit the offered pricing policies.

In order to investigate the efficiency of the application of the PROSREMA algorithm in assisting the users to appropriately select a WISP according to their requirements, different simulation scenarios have been designed and studied, in an evolving manner. In more detail, originally all users transmit accepting the pricing policy of the WISP that were originally allocated to (Scenario A). As a next step, as the system evolves the users based on the competitiveness of their WISP and their associated reward probability r j (t), determine whether to switch to a new WISP or not for the upcoming timeslots (Scenario B). Also, targeting at testing the proposed scheme under fiercer competitive conditions, we assume that WISP 1 (who in previous timeslots applied a homogeneous pricing policy) at some point selects to reduce his overall price levels as a step that enhances competition, while the other two WISPs do not alter their previously announced pricing policies (Scenario C). Finally, depending on the outcome of this scenario, we allow then all WISPs to engage in competition by modifying their prices attempting to win a higher share of the network’s user allocation share (Scenario D). The value of the step size parameter is b = 0.6, considering the results presented in Table 1, Figs. 3 and 4.

Table 1 User share among WISPs and sum rate increase compared to the equal user share (Scenario A)
Fig. 3
figure 3

Sum rate for RT, NRT and overall system users under Scenarios A-D

Fig. 4
figure 4

Sum rate per WISP and the overall system under Scenarios A-D

5.3 Performance analysis

In Table 1, the user share among the WISPs is presented under the aforementioned four different examined scenarios (i.e., scenario A-D), along with the achieved sum rate percentage increase with reference to scenario A, i.e., equal distribution of user shares among the three WISPs. Analyzing the results in Table 1, we observe that after the implementation of Scenario B, the users universally rejected WISP 1 that applied high homogeneous prices, with all of them selecting to switch to one of the other two WISPs, concluding at a share of 0.00% for the WISP 1. The vast majority of the users (including some users originally allocated to WISP 2 that applied QoS-based pricing), opted to switch to WISP 3 due to the topology-based pricing, since this scheme better fits and adapts to their transmission requirements in this scenario. Thus, WISP 3’s user share almost doubled from 33.33% to 63.33%. Moreover, the new user allocation had a positive influence on the overall system’s performance, since the total achieved throughput increased by 67.79% compared to the initial equal user share among the WISPs (scenario A).

Under Scenario C, WISP 1 in an attempt to mitigate the significant losses from the previous scenarios, readjusts his prices to lower levels in order to attract some of the users. Please note that for this scenario, we assume that WISPs 2 and 3 continue to charge their users under exactly the same conditions as before. As a result, under Scenario C we observe that WISP 1 manages to reclaim some of the users by increasing his share to 20.00% mostly against WISP 2, whose share has been reduced to lower levels, even lower than the original allocation (from Scenario A). On the contrary, WISP 3 still maintained the highest user share by serving half of the users, while the overall achieved throughput was further increased (by 71.79% compared to Scenario A).

Lastly, under Scenario D, all WISPs have the ability to readjust their prices, thus leading to a more balanced allocation of the users per provider. Also under these conditions, WISP 3 still maintains the lead in terms of attracted users, whereas the overall sum rate for the system was once more increased by 73.36% compared to the initial Scenario A. From all the above, it becomes obvious that the PROSREMA algorithm allows the users to dynamically adapt to the existing conditions within the network depending on the topology, service, or based on the imposed pricing policies, allowing the system to self-optimize its overall performance. The above observations are further solidified by the results illustrated in Figs. 3 and 4 illustrating the sum rate for both classes of users (i.e., RT and NRT mobile customers), as well as for the different WISPs and the network as a whole, under all the four previously described scenarios. Observing both Figs. 3 and 4, it becomes apparent that gradually, through the implementation of the PROSREMA algorithm and the freedom provided to the users to select the provider of their preference, the overall achieved throughput for the system increases considerably, allowing the exchange of more data among the users for the examined timeslots.

Additionally, the potential to dynamically exchange WISPs enhances competition among the service providers who are encouraged to adjust their imposed prices closer to the requirements of the users, hence leading to a fairer and more efficient allocation of the network’s resources. More specifically, since the introduction of Scenario B and onwards where the WISPs readjust their prices to more rational levels in order to maintain their shares, the users are allowed to claim higher fractions of the network’s available bandwidth.

5.4 Convergence numerical study

The step size parameter b (please refer to relations (3a) and (3b) in Section 3.2 and step 5 of the algorithm) consists an important factor for the system with respect to the convergence time of the PROSREMA algorithm towards the determination of the preferred service provider by the users. Fig. 5 presents the average number of timeslots required for all users to converge to the preferred WISP j with probability close to 1, i.e.,Pr i , j = 0.999, for various values of the parameter b. It can be easily observed, that for higher values of b, fewer timeslots are necessary for convergence leading to a reduction of up to 90.57% in the number of timeslots when b = 0.9 compared to the case where b = 0.1.

Fig. 5
figure 5

Number of timeslots for PROSREMA convergence as a function of the step size parameter b

The above observation is further confirmed via the results presented in Figs. 6a, c, that depict the evolution of the choice probabilities of a randomly selected mobile customer i when selecting one of the three available providers, as a function of the required timeslots. The evolution of the choice probabilities of mobile customer i is presented under various step size parameters, i.e., b = 0.2, b = 0.5 and b = 0.9. For presentation purposes we examine the behavior of user 14 (di = 425 m from the base station), who under Scenarios A and B selects to remain with the WISP 2. Similar results are observed for other users as well. It can be observed that the probability of selecting WISP 2 from the initial probability of Pr14 , 2(t = 0) = 0.33 converges to Pr14 , 2(t = t final ) = 1 regardless of the value of b, while the probabilities of the other two WISPs (1 and 3) in all cases converge to zero, since the user selects not to switch his service provider. However, it can be reaffirmed that the number of the required timeslots towards convergence decreases with higher values of the parameter b requiring only 10 timeslots for convergence when b = 0.9. It should be noted here that in principle if the environment changes slowly then a large value of b would be an appropriate selection due to the fast convergence, while if we have a rapidly changing environment then a smaller value of b would be required, as for higher values of b suboptimal choices about the WISP selection may be obtained in this case.

Fig. 6
figure 6

WISP selection probability under various step size parameters b of user i = 14

Considering the convergence of the power control mechanism as presented in Steps 3, 3a-3c of the PROSREMA algorithm in Section 4, it is very fast since less than thirty iterations are required for reaching equilibrium for all users, starting from randomly selected initial values of uplink transmission power. PROSREMA algorithm was tested and evaluated in an Intel (R) Core (TM) 2 DUO CPU T7500 @ 2.20GHz laptop with 2.00 GBytes available RAM and the runtime of the power control mechanism was less than 0.4 msec. The necessary time in order to converge to the Nash equilibrium point is of similar order of magnitude with the duration of a timeslot (0.5 msec), and therefore can be easily adopted in a realistic scenario. It should be also noted that the convergence time of PROSREMA algorithm can further decrease, if we adopt a more “educated” implementation of the algorithm, i.e., use as initial values of the powers the corresponding values in the previous time slot.

5.5 Comparative results

Next, we provide a comparative study of the proposed PROSREMA framework, considering the aforementioned Scenario B, against two other fundamental works in the literature, where pricing users’ resources was utilized as a basic tool towards enhancing the system’s efficiency and promoting the fairness among the network’s users. In [20], the authors applied universal linear pricing to all mobile customers with regards to user’s transmission power, whereas in [16], a linear universally applied pricing policy as function of the SINR has been adopted. It is noted that for comparison purposes, both research works, i.e., [16, 20] were simulated under the NOMA transmission technique. At this point it should be noted that in this work a convex pricing policy as a function of user’s transmission power has been selected, as discussed in Section 2.3. Convex pricing appears to be a more pragmatic approach against linear pricing techniques, since the harm imposed by a user to his neighbors is not equivalent within the whole range of the feasible transmission power value sets. Moreover, we performed a detailed Monte Carlo analysis over random users’ positions, where we examined 10,000 random network topologies.

Specifically, Figs. 7 and 8, (note that y-axis is in logarithmic scale) illustrate for all the three approaches under consideration (i.e. PROSREMA, [16, 20]) the achieved energy efficiency of each user as a function of his distance from the base station, for Real Time users (Fig. 7) and Non Real Time users (Fig. 8). It is clearly observed that PROSREMA algorithm achieves higher energy efficiency values compared to both [20] and [16] for both classes of users. This outcome stems from the flexibility that is provided to the users to select among a set of different pricing policies (i.e., homogeneous, QoS-based, topology-based). On the other hand, [20, 16] adopt only a homogeneous pricing philosophy, thus offering fewer degrees of freedom to the users, who are not able to adjust the imposed pricing according to their QoS prerequisites, topology characteristics, etc. Hence, PROSREMA algorithm successfully exploits a higher portion of the system’s available bandwidth, leading to a significant increase in the achievable energy efficiency levels, reaching an improvement of up to 20.68% compared to the other approaches, allowing more data bits to be transmitted for less consumed energy.

Fig. 7
figure 7

Energy Efficiency for RT users as a function of their distance from the base station (y-axis in logarithmic scale)

Fig. 8
figure 8

Energy Efficiency for NRT users as a function of their distance from the base station (y-axis in logarithmic scale)

6 Conclusions

In this paper the problem of dynamic provider selection and power resource management in competitive wireless communication markets is studied. Each mobile customer is associated with a well-designed holistic utility function, capturing his perceived satisfaction from the power allocation and WISP selection. The mobile customers act as learning automata, who sense their environment and take the best decision regarding the WISP to be served from. A distributed non-cooperative power control game among the mobile customers is introduced towards determining their optimal uplink transmission power in order to meet their QoS demands. A distributed iterative joint provider selection and resource management (PROSREMA) algorithm is proposed combining the machine learning provider selection process and the distributed power resource management. The performance of the proposed framework has been thoroughly evaluated via modeling, simulation and comparative evaluation, and its operational effectiveness is demonstrated. Part of our current and future work includes the exploitation of mobile customers’ social related information, due to the fact that users with similar social characteristics and interests usually present common behaviors. Furthermore, the machine learning process can be further enriched in terms of input information from the environment via a more sophisticated resource management procedure, where multiple resources, e.g., user’s uplink transmission power and rate, are allocated to the users, thus determining more accurately and in a holistic manner their perceived satisfaction [21, 22].