Keywords

1 Introduction

Current wireless communications in general adopt various types of orthogonal multiple access (OMA) technologies for serving multiple users, such as time division multiple access (TDMA) , frequency division multiple access (FDMA), and code division multiple access (CDMA), where one resource block is exclusively allocated to one mobile user (MU) to avoid possible multiuser interference. In practice, the OMA technologies are relatively easy to implement, albeit at the cost of low spectral efficiency. Recently, with the rapid development of mobile Internet and proliferation of mobile devices, it is expected that future wireless communication systems should be able to support massive connectivity, which is an extremely challenging task for the OMA technologies with limited radio resources. Responding to this, non-orthogonal multiple access (NOMA) has been recently proposed as a promising access technology for the fifth-generation (5G) mobile communication systems, due to its potential in achieving high spectral efficiency and supporting massive access [1,2,3,4].

The principle of NOMA is to exploit the power domain to simultaneously serve multiple MUs utilizing the same radio resources [5,6,7], with the aid of sophisticated successive interference cancellation (SIC) receivers [8, 9]. Despite the adoption of SIC, inter-user interference still exists except for the MU with the strongest channel gain, which limits the overall system performance [10]. To address this issue, power allocation has been considered as an effective method to harness multiuser interference [11, 12]. Since the overall performance is limited by the MUs with weak channel conditions, it is intuitive to allocate more power to the weak MUs and less power to the strong MU in order to enhance the effective channel gain and minimize the interference to the weak MUs [13]. For the specific two-user case, the optimal power allocation scheme was studied in [14], and [15] proposed two sub-optimal power allocation schemes exploiting the Karush–Kuhn–Tucker (KKT) conditions, while the issue of quality of service (QoS) requirements of NONA systems was investigated in [16]. For the case with arbitrary number of users, the computational complexity of performing SIC increases substantially and the design of the optimal power allocation becomes intractable. To facilitate an effective system design, clustering and user pairing have been proposed [17, 18]. Generally speaking, multiple MUs with distinctive channel gains are selected to form a cluster, in which SIC is conducted to mitigate the interference [19, 20]. In general, a small cluster consisting a small number of MUs implies low complexity of SIC, but leads to high inter-cluster interference. Thus, it makes sense to dynamically adjust the size of a cluster according to performance requirements and system parameters, so as to achieve a balance between implementation complexity and interference mitigation [21]. However, dynamic clustering is not able to reduce the inter-cluster interference, indicating the necessity of carrying out dynamic clustering in combination with efficient interference mitigation schemes.

It is well known that the multiple-antenna technology is a powerful interference mitigation scheme [22,23,24,25], hence, can be naturally applied to NOMA systems [26, 27]. In [28], the authors proposed a beamforming scheme for combating inter-cluster and intra-cluster interference in a NOMA downlink, where the base station (BS) was equipped with multiple antennas and the MUs have a single antenna each. A more general setup was considered in [29], where both the BS and the MUs are multiple-antenna devices. By exploiting multiple antennas at the BS and the MUs, a signal alignment scheme was proposed to mitigate both the intra-cluster and inter-cluster interference. It is worth pointing out that the implementation of the two above schemes requires full channel state information (CSI) at the BS, which is usually difficult and costly in practice. To circumvent the difficulty in CSI acquisition, random beamforming was adopted in [30], which inevitably leads to performance loss. Alternatively, the work in [31] suggested to employ zero-forcing (ZF) detection at the multiple-antenna MUs for inter-cluster interference cancelation. However, the ZF scheme requires that the number of antennas at each MU is greater than the number of antennas at the BS, which is in general impractical.

To effectively realize the potential benefits of multiple-antenna techniques, the amount and quality of CSI available at the BS play a key role [32, 33]. In practice, the CSI can be obtained in several different ways. For instance, in time duplex division (TDD) systems, the BS can obtain the downlink CSI through estimating the CSI of uplink by leveraging the channel reciprocity [34]. While in frequency duplex division (FDD) systems, the downlink CSI is usually first estimated and quantized at the MUs, and then is conveyed back to the BS via a feedback link [35]. For both practical TDD and FDD systems, the BS has access to only partial CSI. As a result, there will be residual inter-cluster and intra-cluster interference. To the best of the authors’ knowledge, previous works only consider two extreme cases with full CSI or no CSI, the design, analysis and optimization of multiple-antenna NOMA systems with partial CSI remains an uncharted area. Motivated by this, we present a comprehensive study on the impact of partial CSI on the design, analysis, and optimization of multiple-antenna NOMA downlink communication systems.

The rest of this chapter is organized as follows: Sect. 7.2 gives a brief introduction of the considered NOMA downlink communication system and designs the corresponding multiple-antenna transmission framework. Section 7.3 first analyzes the average transmission rates in presence of imperfect CSI and then proposes three performance optimization schemes. Section 7.4 derives the average transmission rates in two extreme cases through asymptotic analysis and presents some system design guidelines. Section 7.5 provides simulation results to validate the effectiveness of the proposed schemes. Finally, Sect. 7.6 concludes this chapter.

2 System Model and Framework Design

Consider a downlink communication scenario in a single-cell system, where a base station (BS) broadcasts messages to multiple MUs, cf. Fig. 7.1. Note that the BS is equipped with M antennas, while the MUs have a single antenna each due to the size limitation.

Fig. 7.1
figure 1

A multiuser NOMA communication system with 4 clusters

2.1 User Clustering

To strike a balance between the system performance and computational complexity in NOMA systems, it is necessary to carry out user clustering. In particular, user clustering can be designed from different perspectives. For instance, a signal-to-interference-plus-noise ratio (SINR) maximization user clustering scheme was adopted in [36] and quasi-orthogonal MUs were selected to form a cluster in [37]. Intuitively, these schemes perform user clustering by the exhaustive search method, resulting in high implementation complexity. In this chapter, we design a simple user clustering scheme based on the information of spatial direction.Footnote 1 Specifically, the MUs in the same direction but with distinctive propagation distances are arranged into a cluster. On one hand, the same direction of the MUs in a cluster allows the use of a single beam to nearly align all MUs in such a cluster, thereby facilitating the mitigation of the inter-cluster interference and the enhancement of the effective channel gain. On the other hand, a large gap of propagation distances avoids severe inter-user interference and enables a more accurate SIC at the MUs [38,39,40]. If two MUs are close to each other with almost equal channel gains, it is possible to assign them in different clusters by improving the spatial resolution via increasing the number of spatial beams and the number of BS antennas. Without loss of generality, we assume that the MUs are grouped into N clusters with K MUs in each cluster. To facilitate the following presentation, we use \(\alpha _{n,k}^{1/2}\mathbf {h}_{n,k}\) to denote the M-dimensional channel vector from the BS to the kth MU in the nth cluster, where \(\alpha _{n,k}\) is the large-scale channel fading, and \(\mathbf {h}_{n,k}\) is the small-scale channel fading following zero mean complex Gaussian distribution with unit variance. It is assumed that \(\alpha _{n,k}\) remains constant for a relatively long period, while \(\mathbf {h}_{n,k}\) keeps unchanged in a time slot but varies independently over time slots.

2.2 CSI Acquisition

For the TDD mode, the BS obtains the downlink CSI through uplink channel estimation. Specifically, at the beginning of each time slot, the MUs simultaneously send pilot sequences of \(\tau \) symbols to the BS, and the received pilot at the BS can be expressed as

$$\begin{aligned} \mathbf {Y}_P=\sum \limits _{n=1}^{N}\sum \limits _{k=1}^{K}\sqrt{\tau P_{n,k}^P\alpha _{n,k}}\mathbf {h}_{n,k}\varvec{\varPhi }_{n,k}+\mathbf {N}_P, \end{aligned}$$
(7.1)

where \(P_{n,k}^P\) is the transmit power for the pilot sequence of the kth MU in the nth cluster, \(\mathbf {N}_P\) is an additive white Gaussian noise (AWGN) matrix with i.i.d. zero mean and unit variance complex Gaussian distributed entries. \(\varvec{\varPhi }_{n,k}\in C^{1\times \tau }\) is the pilot sequence sent from the kth MU in the nth cluster. It is required that \(\tau >NK\), such that the pairwise orthogonality that \(\varvec{\varPhi }_{n,k}\varvec{\varPhi }_{i,j}^H=0\) and \(\varvec{\varPhi }_{n,k}\varvec{\varPhi }_{n,k}^H=1\), \(\forall (n,k)\ne (i,j)\), can be guaranteed. By making use of the pairwise orthogonality, the received pilot can be transformed as

$$\begin{aligned} \mathbf {Y}_P\varvec{\varPhi }_{n,k}^H=\sqrt{\tau P_{n,k}^P\alpha _{n,k}}\mathbf {h}_{n,k}+\mathbf {N}_P\varvec{\varPhi }_{n,k}^H. \end{aligned}$$
(7.2)

Then, by using minimum mean squared error (MMSE) estimation, the relation between the actual channel gain \(\mathbf {h}_{n,k}\) and the estimated channel gain \(\hat{\mathbf {h}}_{n,k}\) can be expressed as

$$\begin{aligned} \mathbf {h}_{n,k}=\sqrt{\rho _{n,k}}\hat{\mathbf {h}}_{n,k}+\sqrt{1-\rho _{n,k}}\mathbf {e}_{n,k}, \end{aligned}$$
(7.3)

where \(\mathbf {e}_{n,k}\) is the channel estimation error vector with i.i.d. zero mean and unit variance complex Gaussian distributed entries, and is independent of \(\hat{\mathbf {h}}_{n,k}\). Variable \(\rho _{n,k}=\frac{\tau P_{n,k}^P\alpha _{n,k}}{1+\tau P_{n,k}^P\alpha _{n,k}}=1-\frac{1}{1+\tau P_{n,k}^P\alpha _{n,k}}\) is the correlation coefficient between \(\mathbf {h}_{n,k}\) and \(\hat{\mathbf {h}}_{n,k}\). A large \(\rho _{n,k}\) means a high accuracy for channel estimation. Thus, it is possible to improve the CSI accuracy by increasing the transmit power \(P_{n,k}^P\) or the length \(\tau \) of pilot sequence.

For the FDD mode, the CSI is usually conveyed from the MUs to the BS through a feedback link. Since the feedback link is rate-constrained, CSI at the MUs should first be quantized. Specifically, the kth MU in the nth cluster chooses an optimal codeword from a predetermined quantization codebook \(\fancyscript{B}_{n,k}=\{\tilde{\mathbf {h}}_{n,k}^{(1)},\dots ,\tilde{\mathbf {h}}_{n,k}^{(2^{B_{n,k}})}\}\) of size \(2^{B_{n,k}}\), where \(\tilde{\mathbf {h}}_{n,k}^{(j)}\) is the jth codeword of a unit norm and \(B_{n,k}\) is the number of feedback bits. Mathematically, the codeword selection criterion is given by

$$\begin{aligned} j^{\star }=\arg \max \limits _{1\le j\le 2^{B_{n,k}}}\left| \mathbf {h}_{n,k}^{H}\tilde{\mathbf {h}}_{n,k}^{(j)}\right| ^2. \end{aligned}$$
(7.4)

Then, the MU conveys the index \(j^{\star }\) to the BS with \(B_{n,k}\) feedback bits, and the BS recoveries the quantized CSI \(\tilde{\mathbf {h}}_{n,k}^{(j^{\star })}\) from the same codebook. In other words, the BS only gets the phase information by using the feedback scheme based on a quantization codebook. However, as shown in below, the phase information is sufficient for the design of spatial beamforming. Similarly, the relation between the real CSI and the obtained CSI in FDD mode can be approximated as [41, 42]

$$\begin{aligned} \tilde{\mathbf {h}}_{n,k}=\sqrt{\rho _{n,k}}\tilde{\mathbf {h}}_{n,k}^{\star }+\sqrt{1-\rho _{n,k}}\tilde{\mathbf {e}}_{n,k}, \end{aligned}$$
(7.5)

where \(\tilde{\mathbf {h}}_{n,k}=\frac{\mathbf {h}_{n,k}}{\Vert \mathbf {h}_{n,k}\Vert }\) is the phase of the channel \(\mathbf {h}_{n,k}\), \(\tilde{\mathbf {h}}_{n,k}^{\star }\) is the quantized phase information, \(\tilde{\mathbf {e}}_{n,k}\) is the quantization error vector with uniform distribution, and \(\rho _{n,k}=1-2^{-\frac{B_{n,k}}{M-1}}\) is the associated correlation coefficient or CSI accuracy. Thus, it is possible to improve the CSI accuracy by increasing the size of quantization codebook for a given number of antennas M at the BS.

2.3 Superposition Coding and Transmit Beamforming

Based on the available CSI, the BS constructs one transmit beam for each cluster, so as to mitigate or even completely cancel the inter-cluster interference. To strike balance between system performance and implementation complexity, we adopt zero-force beamforming (ZFBF) at the BS. We take the design of beam \(\mathbf {w}_i\) for the ith cluster as an example. First, we construct a complementary matrix \(\bar{\mathbf {H}}_i\)Footnote 2 as:

$$\begin{aligned} \bar{\mathbf {H}}_i=[\hat{\mathbf {h}}_{1,1},\ldots ,\hat{\mathbf {h}}_{1,K},\ldots ,\hat{\mathbf {h}}_{i-1,K},\hat{\mathbf {h}}_{i+1,1},\ldots ,\hat{\mathbf {h}}_{N,K}]^H. \end{aligned}$$
(7.6)

Then, we perform singular value decomposition (SVD) on \(\bar{\mathbf {H}}_i\) and obtain its right singular vectors \(\mathbf {u}_{i,j}, j=1,\ldots ,N_u,\) with respect to the zero singular values, where \(N_u\) is the number of zero singular values. Finally, we can design the beam as \(\mathbf {w}_i=\sum \nolimits _{j=1}^{N_u}\theta _{i,j}\mathbf {u}_{i,j}\), where \(\theta _{i,j}>0\) is a weight such that \(\sum \nolimits _{j=1}^{N_u}\theta _{i,j}=1\). Thus, the received signal at the kth MU in the nth cluster is given by

$$\begin{aligned} y_{n,k}= & {} \sqrt{\alpha _{n,k}}\mathbf {h}_{n,k}^H\sum \limits _{i=1}^{N}\mathbf {w}_is_i+n_{n,k}\nonumber \\= & {} \sqrt{\alpha _{n,k}}\mathbf {h}_{n,k}^H\mathbf {w}_ns_n+\sqrt{\alpha _{n,k}(1-\rho _{n,k})}\mathbf {e}_{n,k}^H\sum \limits _{i=1,i\ne n}^{N}\mathbf {w}_is_i+n_{n,k}, \end{aligned}$$
(7.7)

where \(s_i=\sum \nolimits _{j=1}^{K}\sqrt{P_{i,j}^S}s_{i,j}\) is the superposition coded signal with \(P_{i,j}^S\) and \(s_{i,j}\) being transmit power and transmit signal for the jth MU in the ith cluster, and \(n_{n,k}\) is the AWGN with unit variance. In general, \(P_{i,j}^S\) should be carefully allocated to distinguish the MUs in the power domain, which we will discuss in detail below. Note that Eq. (7.7) holds true due to the fact that \(\mathbf {h}_{n,k}^H\mathbf {w}_i=\sqrt{\rho _{n,k}}\hat{\mathbf {h}}_{n,k}^H\mathbf {w}_i+\sqrt{1-\rho _{n,k}}\mathbf {e}_{n,k}^H\mathbf {w}_i=\sqrt{1-\rho _{n,k}}\mathbf {e}_{n,k}^H\mathbf {w}_i\) for ZFBF in TDD mode.Footnote 3 With perfect CSI at the BS, i.e., \(\rho _{n,k}=1\), the inter-cluster interference can be completely canceled.

2.4 Successive Interference Cancellation

Although ZFBF at the BS can mitigate partial inter-cluster interference from the other clusters, there still exists intra-cluster interference from the same cluster. In order to improve the received signal quality, the MU conducts SIC according to the principle of NOMA. Without loss of generality, we assume that the effective channel gains in the ith cluster have the following order:

$$\begin{aligned} |\sqrt{\alpha _{i,1}}\mathbf {h}_{i,1}^H\mathbf {w}_i|^2\ge \cdots \ge |\sqrt{\alpha _{i,K}}\mathbf {h}_{i,K}^H\mathbf {w}_i|^2. \end{aligned}$$
(7.8)

It is reasonably assumed that the BS may know MUs’ effective gains through the channel quality indicator (CQI) messages, and then determines the user order in (7.8). Thus, in the ith cluster, the jth MU can always successively decode the lth MU’s signal, \(\forall l>j\), if the lth MU can decode its own signal. As a result, the jth MU can subtract the interference from the lth MU in the received signal before decoding its own signal. After SIC, the signal-to-interference-plus-noise ratio (SINR) at the kth MU in the nth cluster can be expressed as

$$\begin{aligned} \gamma _{n,k}=\frac{\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2P_{n,k}^{S}}{\underbrace{\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k-1}P_{n,j}^{S}}_{\text {Intra-cluster interference}}+\underbrace{\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S}_{\text {Inter-cluster interference}}+\underbrace{1}_{\text {AWGN}}}, \end{aligned}$$
(7.9)

where the first term in the denominator of (7.9) is the residual intra-cluster interference after SIC at the MU, the second one is the residual inter-cluster interference after ZFBF at the BS, and the third one is the AWGN. For the 1st MU in each cluster, there is no intra-cluster interference, since it can completely eliminate the intra-cluster interference. Note that in this chapter, we assume that perfect SIC can be performed at the MUs. In practical NOMA systems, SIC might be imperfect due to a limited computational capability at the MUs. Thus, there exists residual intra-cluster interference from the weaker MUs even after SIC [43]. However, the study of the impact of imperfect SIC on the system performance is beyond the scope of this chapter and we would like to investigate it in the future work. Moreover, the transmit power has a significant impact on the SIC and the performance of NOMA [44]. Thus, we will quantitatively analyze the impact of transmit power and then aim to optimize the transmit power for improving the performance in the following sections.

3 Performance Analysis and Optimization

In this section, we concentrate on performance analysis and optimization of multi-antenna NOMA downlink with imperfect CSI. Specifically, we first derive closed-form expressions for the average transmission rates of the 1st MU and the other MUs, and then propose separate and joint optimization schemes of transmit power, feedback bits, and transmit mode, so as to maximize the average sum rate of the system.

3.1 Average Transmission Rate

We start by analyzing the average transmission rate of the kth MU in the nth cluster. First, we consider the case \(k>1\). According to the definition, the corresponding average transmission rate can be computed as

$$\begin{aligned} R_{n,k}= & {} \mathrm {E}\left[ \log _2\left( 1+\gamma _{n,k}\right) \right] \nonumber \\= & {} \mathrm {E}\left[ \log _2\left( \frac{\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k}P_{n,j}^{S}+\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S+1}{\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k-1}P_{n,j}^{S}+\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S+1}\right) \right] \nonumber \\= & {} \mathrm {E}\left[ \log _2\left( \alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k}P_{n,j}^{S}+\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S+1\right) \right] \nonumber \\&-\mathrm {E}\left[ \log _2\left( \alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k-1}P_{n,j}^{S}+\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S+1\right) \right] .\nonumber \\ \end{aligned}$$
(7.10)

Note that the average transmission rate in (7.10) can be expressed as the difference of two terms, which have a similar form. Hence, we concentrate on the derivation of the first term. For notational convenience, we use W to denote the term \(\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k}P_{n,j}^{S}+\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S\). To compute the first expectation, the key is to obtain the probability density function (pdf) of W. Checking the first random variable \(|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\) in W, since \(\mathbf {w}_n\) of unit norm is designed independent of \(\mathbf {h}_{n,k}\), \(|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\) is \(\chi ^2\) distributed with 2 degrees of freedom [45]. Similarly, \(|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\) also has the distribution \(\chi ^2(2)\). Therefore, W can be considered as a weighted sum of N random variables with \(\chi ^2(2)\) distribution. According to [46], W is a nested finite weighted sum of N Erlang pdfs, whose pdf is given by

$$\begin{aligned} f_W(x)=\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) g(x,\eta _{n,k}^i), \end{aligned}$$
(7.11)

where

$$ \eta _{n,k}^q=\left\{ \begin{array}{ll} \alpha _{n,k}\sum \limits _{j=1}^{k}P_{q,j}^{S} &{} \text {if}\, q=n\\ \alpha _{n,k}(1-\rho _{n,k})\sum \limits _{l=1}^{K}P_{q,l}^S &{} \text {if}\, q\ne n \end{array} \right. , $$
$$\begin{aligned} g(x,\eta _{n,k}^i)=\frac{1}{\eta _{n,k}^i}\exp \left( -\frac{x}{\eta _{n,k}^i}\right) , \end{aligned}$$
$$\begin{aligned} \varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) =\frac{(-1)^{N-1}\eta _{n,k}^i}{\prod \nolimits _{l=1}^{N}\eta _{n,k}^l}\prod \limits _{s=1}^{N-1}\left( \frac{1}{\eta _{n,k}^i}-\frac{1}{\eta _{n,k}^{s+\mathbf {U}(s-i)}}\right) ^{-1}, \end{aligned}$$

and \(\mathbf {U}(x)\) is the well-known unit step function defined as \(\mathbf {U}(x\ge 0)=1\) and zero otherwise. It is worth pointing out that the weights \(\varXi _N\) are constant for given \(\{\eta _{n,k}^q\}_{q=1}^{N}\). Hence, the first expectation in (7.10) can be computed as

$$\begin{aligned} \mathrm {E}[\log _2(1+W)]= & {} \int _0^{\infty }\log _2(1+x)f_W(x)dx\nonumber \\= & {} \sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) \int _0^{\infty }\log _2(1+x)\frac{1}{\eta _{n,k}^i}\exp \left( -\frac{x}{\eta _{n,k}^i}\right) dx\nonumber \\= & {} -\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) \exp \left( \frac{1}{\eta _{n,k}^i}\right) \mathrm {E_i}\left( -\frac{1}{\eta _{n,k}^i}\right) , \end{aligned}$$
(7.12)

where \(\mathrm {E_i}(x)=\int _{-\infty }^{x}\frac{\exp (t)}{t}dt\) is the exponential integral function. Equation (7.12) follows from [47, Eq. (4.3372)]. Similarly, we use V to denote \(\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \nolimits _{j=1}^{k-1}P_{n,j}^{S}+\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \nolimits _{t=1}^{K}P_{i,t}^S\) in the second term of (7.10). Thus, the second expectation term can be computed as

$$\begin{aligned} \mathrm {E}[\log _2(1+V)]=-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) \exp \left( \frac{1}{\beta _{n,k}^i}\right) \mathrm {E_i}\left( -\frac{1}{\beta _{n,k}^i}\right) , \end{aligned}$$
(7.13)

where

$$ \beta _{n,k}^v=\left\{ \begin{array}{ll} \alpha _{n,k}\sum \limits _{j=1}^{k-1}P_{v,j}^{S} &{} \text {if}\, v=n\\ \alpha _{n,k}(1-\rho _{n,k})\sum \limits _{l=1}^{K}P_{v,l}^S &{} \text {if}\, v\ne n \end{array} \right. . $$

Hence, we can obtain the average transmission rate for the kth MU in the nth cluster as follows

$$\begin{aligned} R_{n,k}= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) \exp \left( \frac{1}{\beta _{n,k}^i}\right) \mathrm {E_i}\left( -\frac{1}{\beta _{n,k}^i}\right) \nonumber \\&-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) \exp \left( \frac{1}{\eta _{n,k}^i}\right) \mathrm {E_i}\left( -\frac{1}{\eta _{n,k}^i}\right) . \end{aligned}$$
(7.14)

Then, we consider the case \(k=1\). Since the first MU can decode all the other MUs’ signals in the same cluster, there is no intra-cluster interference. In this case, the corresponding average transmission rate reduces to

$$\begin{aligned} R_{n,1}= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N-1}\varXi _{N-1}\left( i,\{\beta _{n,1}^v\}_{v=1}^{N-1}\right) \exp \left( \frac{1}{\beta _{n,1}^i}\right) \mathrm {E_i}\left( -\frac{1}{\beta _{n,1}^i}\right) \nonumber \\&-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,1}^q\}_{q=1}^N\right) \exp \left( \frac{1}{\eta _{n,1}^i}\right) \mathrm {E_i}\left( -\frac{1}{\eta _{n,1}^i}\right) , \end{aligned}$$
(7.15)

where

$$ \eta _{n,1}^q=\left\{ \begin{array}{ll} \alpha _{n,1}P_{q,1}^{S} &{} \text {if}\, q=n\\ \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{l=1}^{K}P_{q,l}^S &{} \text {if}\, q\ne n \end{array} \right. , $$

and

$$ \beta _{n,1}^v=\left\{ \begin{array}{ll} \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{l=1}^{K}P_{v,l}^S &{} \text {if}\, v<n\\ \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{l=1}^{K}P_{v+1,l}^S &{} \text {if}\, v\ge n \end{array} \right. . $$

Combing (7.14) and (7.15), it is easy to evaluate the performance of a multiple-antenna NOMA downlink with arbitrary system parameters and channel conditions. In particular, it is possible to reveal the impact of system parameters, i.e., transmit power, CSI accuracy, and transmission mode.

3.2 Power Allocation

From (7.14) and (7.15), it is easy to observe that with imperfect CSI, transmit power has a great impact on average transmission rates. On one hand, increasing the transmit power can enhance the desired signal strength. On the other hand, it also increases the interference. Thus, it is desired to distribute the transmit power according to channel conditions.

To maximize the sum rate of the considered multiple-antenna NOMA system subject to a total power constraint, we have the following optimization problem:

$$\begin{aligned}&J_1: \max \limits _{P_{n,k}^S}\sum \limits _{n=1}^{N}\sum \limits _{k=1}^K{R}_{n,k}\nonumber \\ \mathrm {s.t.}&\,\,\mathrm {C1:}\sum \limits _{n=1}^{N}\sum \limits _{k=1}^K{P}_{n,k}^S\le P_{tot}^S\nonumber \\&\,\,\mathrm {C2:}{P}_{n,k}^S>0, \end{aligned}$$
(7.16)

where \(P_{tot}^S\) is the maximum total transmit power budget. It is worth pointing out that in certain scenarios, user fairness might be of particular importance. To guarantee user fairness, one can replace the objective function of \(J_1\) with the maximization of a weighted sum rate, where the weights can directly affect the power allocation and thus the MUs’ rates. Unfortunately, \(J_1\) is not a convex problem due to the complicated expression for the objective function. Thus, it is difficult to directly provide a closed-form solution for the optimal transmit power. As a compromise solution, we propose an effective power allocation scheme based on the following important observation of the multiple-antenna NOMA downlink system:

Lemma 1

The inter-cluster interference is dependent of power allocation between the clusters, while the intra-cluster interference is determined by power allocation among the MUs in the same cluster.

Proof

A close observation of the inter-cluster interference \(\alpha _{n,k}(1-\rho _{n,k})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\) \(\sum \nolimits _{l=1}^{K}P_{i,l}^S\) in (7.9) indicates that \(\sum \nolimits _{l=1}^{K}P_{i,l}^S\) is the total transmit power for the ith cluster, which suggests that inter-cluster power allocation does not affect the inter-cluster interference.\(\square \)

Inspired by Lemma 1, the power allocation scheme can be divided into two steps. In the first step, the BS distributes the total power among the N clusters. In the second step, each cluster individually carries out power allocation subject to the power constraint determined by the first step. In the following, we give the details of the two-step power allocation scheme. First, we design the power allocation between the clusters from the perspective of minimizing inter-cluster interference. For the ith cluster, the average aggregate interference to the other clusters is given by

$$\begin{aligned} I_i= & {} \mathrm {E}\left[ \sum \limits _{n=1,n\ne i}^N\sum \limits _{k=1}^{K}\alpha _{n,k}(1-\rho _{n,k})|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \limits _{l=1}^{K}P_{i,l}^S\right] \nonumber \\= & {} \left( \sum \limits _{n=1,n\ne i}^N\sum \limits _{k=1}^{K}\alpha _{n,k}(1-\rho _{n,k})\right) P_i^{S}, \end{aligned}$$
(7.17)

where \(P_i^{S}=\sum \nolimits _{l=1}^{K}P_{i,l}^S\) is the total transmit power of the ith cluster. Equation (7.17) follows the fact that \(\mathrm {E}[|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2]=1\). Intuitively, a large interference coefficient \(\sum \nolimits _{n=1,n\ne i}^N\sum \nolimits _{k=1}^{K}\alpha _{n,k}(1-\rho _{n,k})\) means a more severe inter-cluster interference caused by the ith cluster. In order to mitigate the inter-cluster interference for improving the average sum rate, we propose to distribute the power proportionally to the reciprocal of interference coefficient. Specifically, the transmit power for the ith cluster can be computed as

$$\begin{aligned} P_{i}^{S}=\frac{\left( \sum \nolimits _{n=1,n\ne i}^N\sum \nolimits _{k=1}^{K}\alpha _{n,k}(1-\rho _{n,k})\right) ^{-1}}{\sum \nolimits _{l=1}^{N}\left( \sum \nolimits _{n=1,n\ne l}^N\sum \nolimits _{k=1}^{K}\alpha _{n,k}(1-\rho _{n,k})\right) ^{-1}}P_{tol}^{S}. \end{aligned}$$
(7.18)

Then, we allocate the power in the cluster for further increasing the average sum rate. According to the nature of NOMA techniques, the first MU not only has the strongest effective channel gain for the desired signal, but also generates a weak interference to the other MUs. On the contrary, the Kth MU has the weakest effective channel gain for the desired signal and also produces a strong interference to the other MUs. Thus, from the perspective of maximizing the sum of average rate, it is better to allocate the power based on the following criterion:

$$\begin{aligned} P_{n,1}^S\ge \cdots \ge P_{n,k}^S\ge \cdots \ge P_{n,K}^S. \end{aligned}$$
(7.19)

On the other hand, in order to facilitate SIC, the NOMA in general requires the transmit powers in a cluster to follow a criterion below [31]:

$$\begin{aligned} P_{n,1}^S\le \cdots \le P_{n,k}^S\le \cdots \le P_{n,K}^S. \end{aligned}$$
(7.20)

Under this condition, the MU performs SIC according to the descending order of the user index, namely the ascending order of the effective channel gain. Specifically, the kth MU cancels the interference from the Kth to the \((k+1)\)th MU in sequence. Thus, the SINR for decoding each interference signal is the highest, which facilitates SIC at MUs [44].

To simultaneously fulfill the above two criterions, we propose to equally distribute the powers within a cluster, namely

$$\begin{aligned} P_{n,k}^S=P_n^{S}/K. \end{aligned}$$
(7.21)

Substituting (7.18) into (7.21), the transmit power for the kth MU in the nth cluster can be computed as

$$\begin{aligned} P_{n,k}^{S} =\frac{\left( \sum \nolimits _{i=1,i\ne n}^N\sum \nolimits _{j=1}^{K}\alpha _{i,j}(1-\rho _{i,j})\right) ^{-1}}{K\left( \sum \nolimits _{l=1}^{N}\left( \sum \nolimits _{i=1,i\ne l}^N\sum \nolimits _{j=1}^{K}\alpha _{i,j}(1-\rho _{i,j})\right) ^{-1}\right) }P_{tol}^{S}. \end{aligned}$$
(7.22)

Thus, we can distribute the transmit power based on (7.22) for given channel statistical information and the CSI accuracy, which has a quite low computational complexity.

Remark 1

We note that path loss coefficient \(\alpha _{n,k}, \forall n, k\), remain constant for a relatively long time, and it is easy to obtain at the BS via long-term measurement. Hence, the proposed power allocation scheme incurs a low system overhead and can be implemented with low complexity.

3.3 Feedback Distribution

For the FDD mode, the accuracy of quantized CSI relies on the size of codebook \(2^{B_{n,k}}\), where \(B_{n,k}\) is the number of feedback bits from the kth MU in the nth cluster. As observed in (7.14) and (7.15), it is possible to decrease the interference by increasing feedback bits. However, due to the rate constraint on the feedback link, the total number of feedback bits is limited. Therefore, it is of great importance to optimize the feedback bits among the MUs for performance enhancement.

According to the received signal-to-noise ratio (SNR) in (7.9), the CSI accuracy only affects the inter-cluster interference. Thus, it makes sense to optimize the feedback bits to minimizing the average sum of inter-cluster interference given by

$$\begin{aligned} I_{\mathrm {inter}}= & {} \mathrm {E}\left[ \sum \limits _{n=1}^{N}\sum \limits _{k=1}^{K}\alpha _{n,k}(1-\rho _{n,k})\sum \limits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,k}^H\mathbf {w}_i|^2\sum \limits _{l=1}^{K}P_{i,l}^S\right] \nonumber \\= & {} \sum \limits _{n=1}^{N}\sum \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}2^{-\frac{B_{n,k}}{M-1}}. \end{aligned}$$
(7.23)

Hence, the optimization problem for feedback bits distribution can be expressed as

$$\begin{aligned}&J_2: \min \limits _{B_{n,k}}\sum \limits _{n=1}^{N}\sum \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}2^{-\frac{B_{n,k}}{M-1}}\nonumber \\ \mathrm {s.t.}&\,\,\mathrm {C3:}\sum \limits _{n=1}^{N}\sum \limits _{k=1}^K{B}_{n,k}\le B_{\mathrm {tot}},\nonumber \\&\,\,\mathrm {C4:}{B}_{n,k}\ge 0, \end{aligned}$$
(7.24)

where \(B_{\mathrm {tot}}\) is an upper bound on the total number of feedback bits. \(J_2\) is an integer programming problem, hence is difficult to solve. To tackle this challenge, we relax the integer constraint on \(B_{n,k}\). In this case, according to the fact that

$$\begin{aligned} \sum \limits _{n=1}^{N}\sum \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}2^{-\frac{B_{n,k}}{M-1}}\ge & {} NK\left( \prod \limits _{n=1}^{N}\prod \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}2^{-\frac{B_{n,k}}{M-1}}\right) ^{\frac{1}{NK}}\nonumber \\= & {} NK\left( 2^{-\frac{\sum \nolimits _{n=1}^{N}\sum \nolimits _{k=1}^{K}B_{n,k}}{M-1}}\right) ^{\frac{1}{NK}}\left( \prod \limits _{n=1}^{N}\prod \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}\right) ^{\frac{1}{NK}}\nonumber \\= & {} NK\left( 2^{-\frac{B_{tot}}{M-1}}\right) ^{\frac{1}{NK}}\left( \prod \limits _{n=1}^{N}\prod \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}\right) ^{\frac{1}{NK}}, \end{aligned}$$
(7.25)

where the equality holds true only when \(\alpha _{n,k}\sum \nolimits _{i=1,i\ne n}^{N}P_{i}^{S}2^{-\frac{B_{n,k}}{M-1}}, \forall n, k\) are equal. In other words, the objective function in (7.24) can be minimized while satisfying the following condition:

$$\begin{aligned} \alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}2^{-\frac{B_{n,k}}{M-1}}=\left( 2^{-\frac{B_{tot}}{M-1}}\right) ^{\frac{1}{NK}}\left( \prod \limits _{n=1}^{N}\prod \limits _{k=1}^{K}\alpha _{n,k}\sum \limits _{i=1,i\ne n}^{N}P_{i}^{S}\right) ^{\frac{1}{NK}}. \end{aligned}$$
(7.26)

Hence, based on the relaxed optimization problem, the optimal number of feedback bits for the kth MU in the nth cluster is given by

$$\begin{aligned} B_{n,k}=\frac{B_{\mathrm {tot}}}{NK}-\frac{1}{NK}\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{K}\log _2\left( \alpha _{i,j}\sum \limits _{l=1,l\ne i}^{N}P_{l}^{S}\right) +\log _2\left( \alpha _{n,k}\sum \limits _{l=1,l\ne n}^{N}P_{l}^{S}\right) . \end{aligned}$$
(7.27)

Given channel statistical information and transmit power allocation, it is easy to determine the feedback distribution according to (7.27). Note that there exists an integer constraint on the number of feedback bits in practice, so we should utilize the maximum integer that is not larger than \(B_{n,k}\) in (7.27), i.e., \(\lfloor B_{n,k}\rfloor , \forall n, k\).

Remark 2

The number of feedback bits distributed to the kth MU in the nth cluster is determined by the average inter-cluster interference generated by the kth MU in the nth cluster with respect to the average inter-cluster interference of each MU. In other words, if one MU generates more inter-cluster interference, it would be allocated with more feedback bits, so as to facilitate a more accurate ZFBF to minimize the total interference.

3.4 Mode Selection

As discussed above, the performance of the multiple-antenna NOMA system is limited by both inter-cluster and intra-cluster interference. Although ZFBF at the BS and SIC at the MUs are jointly applied, there still exists residual interference. Intuitively, the strength of the residual interference mainly relies on the number of clusters N and the number of MUs in each cluster K. For instance, increasing the number of MUs in each cluster might reduce the inter-cluster interference, but also results in an increase in intra-cluster interference. Thus, it is desired to dynamically adjust the transmission mode, including the number of clusters and the number of MUs in each cluster, according to channel conditions and system parameters. For dynamic mode selection, we have the following lemma:

Lemma 2

If the BS has no CSI about the downlink, it is optimal to set \(N=1\). On the other hand, if the BS has perfect CSI about the downlink, \(K=1\) is the best choice.

Proof

First, if there is no CSI, namely \(\rho _{n,k}=0, \forall n, k\), ZFBF cannot be utilized to mitigate the inter-cluster interference. If all the MUs belong to one cluster, interference can be mitigated as much as possible by SIC. In the case of perfect CSI at the BS, ZFBF can completely the interference. Thus, it is optimal to arrange one MU in one cluster. \(\square \)

In above, we consider two extreme scenarios of no and perfect CSI at the BS, respectively. In practice, the BS has partial CSI through channel estimation or quantization feedback. Thus, we propose to dynamically choose the transmission mode for maximizing the sum of average transmission rate, which is equivalent to an optimization problem below:

$$\begin{aligned}&J_3: \max \limits _{N, K}\sum \limits _{n=1}^{N}\sum \limits _{k=1}^K{R}_{n,k}\nonumber \\ \mathrm {s.t.}&\,\,\mathrm {C5:}\;NK=N_u,\nonumber \\&\,\,\mathrm {C6:}\;N>0,\nonumber \\&\,\,\mathrm {C7:}\;K>0, \end{aligned}$$
(7.28)

where \(N_u\) is the number of MUs in the multiple-antenna NOMA system. \(J_3\) is also an integer programming problem, so it is difficult to obtain the closed-form solution. Under this condition, it is feasible to get the optimal solution by numerical search and the search complexity is \(O(N^K)\). In order to control the complexity of SIC, the number of MUs in one cluster is usually small, e.g., \(K=2\). Therefore, the complexity of numerical search is acceptable.

3.5 Joint Optimization Scheme

In fact, transmit power, feedback bits and transmission mode are coupled, and determine the performance together. Therefore, it is better to jointly optimize these variables, so as to further improve the performance of the multiple-antenna NOMA systems. For example, given a transmission mode, it is easy to first allocate transmit power according to (7.22), and then distribute feedback bits according to (7.27). Finally, we can select an optimal transmission mode with the largest sum rate. The complexity of the joint optimization is mainly determined by the mode selection. As mentioned above, if the number of MUs in one cluster is small, the complex of mode selection is acceptable.

4 Asymptotic Analysis

In order to provide insightful guidelines for system design, we now pursue an asymptotic analysis on the average sum rate of the system. In particular, two extreme cases are studied, namely interference limited and noise limited.

4.1 Interference Limited Case

With loss of generality, we let \(P_{n,k}^S=\theta _{n,k}P_{tot}^S, \forall n, k\), where \(0<\theta _{n,k}<1\) is a power allocation factor. For instance, \(\theta _{n,k}\) is equal to \(\frac{\left( \sum \nolimits _{v=1,v\ne n}^N\sum \nolimits _{j=1}^{K}\alpha _{v,j}(1-\rho _{v,j})\right) ^{-1}}{K\left( \sum \nolimits _{l=1}^{N}\left( \sum \nolimits _{v=1,v\ne l}^N\sum \nolimits _{j=1}^{K}\alpha _{v,j}(1-\rho _{v,j})\right) ^{-1}\right) }\) in the proposed power allocation scheme in Sect. 7.3.2. If the total power \(P_{tot}^S\) is large enough, the noise term of SINR in (7.9) is negligible. In this case, with the help of [47, Eq. (4.3311)], the average transmission rate of the kth MU (\(k>1\)) in the nth cluster reduces to

$$\begin{aligned} R_{n,k}= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) \ln (\eta _{n,k}^i)\nonumber \\&-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) \ln (\beta _{n,k}^i), \end{aligned}$$
(7.29)

where we have also used the fact that

$$\begin{aligned} \sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) =\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) =1. \end{aligned}$$
(7.30)

Similarly, the asymptotic average transmission rate of the 1st MU in the nth MU can be obtained as

$$\begin{aligned} R_{n,1}= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,1}^q\}_{q=1}^N\right) \ln \left( \eta _{n,1}^i\right) \nonumber \\&-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N-1}\varXi _{N-1}\left( i,\{\beta _{n,1}^v\}_{v=1}^{N-1}\right) \ln \left( \beta _{n,1}^i\right) . \end{aligned}$$
(7.31)

Combining (7.29) and (7.31), we have the following important result:

Theorem 1

In the region of high transmit power, the average transmission rate is independent of \(P_{tot}^S\), and there exists a performance ceiling regardless of \(P_{tot}^S\), i.e., once \(P_{tot}^S\) is larger than a saturation point, the average transmission rate will not increase further even the transmit power increases.

Proof

According to the definitions, \(\eta _{n,k}^i\) and \(\beta _{n,k}^i\) can be rewritten as \(\eta _{n,k}^i=\omega _{n,k}^{i}P_{tot}^S\) and \(\beta _{n,k}^i=\psi _{n,k}^{i}P_{tot}^S\), where

$$ \omega _{n,k}^i=\left\{ \begin{array}{ll} \alpha _{n,k}\sum \limits _{j=1}^{k}\theta _{i,j} &{} \text {if}\, i=n\\ \alpha _{n,k}(1-\rho _{n,k})\sum \limits _{l=1}^{K}\theta _{i,l} &{} \text {if}\, i\ne n \end{array} \right. , $$

and

$$ \psi _{n,k}^i=\left\{ \begin{array}{ll} \alpha _{n,k}\sum \limits _{j=1}^{k-1}\theta _{i,j} &{} \text {if}\, i=n\\ \alpha _{n,k}(1-\rho _{n,k})\sum \limits _{l=1}^{K}\theta _{i,l} &{} \text {if}\, i\ne n \end{array} \right. , $$

respectively. Thus, \(\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) \) and \(\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) \) are independent of \(P_{tot}^S\). Hence, \(R_{n,k}\) in (7.29) can be transformed as

$$\begin{aligned} R_{n,k}= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) (\ln (P_{tot}^S)+\ln (\omega _{n,k}^i))\nonumber \\&-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) (\ln (P_{tot}^S)+\ln (\psi _{n,k}^i))\nonumber \\= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) \ln (\omega _{n,k}^i)-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) \ln (\psi _{n,k}^i),\nonumber \\ \end{aligned}$$
(7.32)

where Eq. (7.32) follows the fact that \(\sum \nolimits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,k}^q\}_{q=1}^N\right) =\sum \nolimits _{i=1}^{N}\varXi _N\left( i,\{\beta _{n,k}^v\}_{v=1}^N\right) =1\). Similarly, we can rewrite \(R_{n,1}\) in (7.31) as

$$\begin{aligned} R_{n,1}= & {} \frac{1}{\ln (2)}\sum \limits _{i=1}^{N}\varXi _N\left( i,\{\eta _{n,1}^q\}_{q=1}^N\right) \ln \left( \omega _{n,1}^i\right) \nonumber \\&-\frac{1}{\ln (2)}\sum \limits _{i=1}^{N-1}\varXi _{N-1}\left( i,\{\beta _{n,1}^v\}_{v=1}^{N-1}\right) \ln \left( \psi _{n,1}^i\right) , \end{aligned}$$
(7.33)

where

$$ \omega _{n,1}^i=\left\{ \begin{array}{ll} \alpha _{n,1}\theta _{i,1}^{S} &{} \text {if}\, i=n\\ \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{l=1}^{K}\theta _{i,l}^S &{} \text {if}\, i\ne n \end{array} \right. , $$

and

$$ \psi _{n,1}^i=\left\{ \begin{array}{ll} \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{l=1}^{K}\theta _{i,l}^S &{} \text {if}\, i<n\\ \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{l=1}^{K}\theta _{i+1,l}^S &{} \text {if}\, i\ge n \end{array} \right. . $$

Note that both (7.32) and (7.33) are regardless of \(P_{tot}^S\), which proves Theorem 1. \(\square \)

Now, we investigate the relation between the performance ceiling in Theorem 1 and the CSI accuracy \(\rho _{n,k}\). First, we consider \(R_{n,k}\) with \(k>1\). As \(\rho _{n,k}\) asymptotically approaches 1, the inter-cluster interference is negligible. Then, \(R_{n,k}\) can be further reduced as

$$\begin{aligned} R_{n,k}^{\text {ideal}}= & {} \mathrm {E}\left[ \log _2\left( \alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \limits _{j=1}^{k}P_{n,j}^{S}\right) \right] -\mathrm {E}\left[ \log _2\left( \alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\sum \limits _{j=1}^{k-1}P_{n,j}^{S}\right) \right] \nonumber \\= & {} \log _2\left( \frac{\sum \nolimits _{j=1}^{k}\omega _{n,j}}{\sum \nolimits _{j=1}^{k-1}\psi _{n,j}}\right) . \end{aligned}$$
(7.34)

It is found that even with perfect CSI, the average transmission rate for the \((k>1)\)th MU is still upper bounded. The bound \(\log _2\left( \frac{\sum \nolimits _{j=1}^{k}\omega _{n,j}}{\sum \nolimits _{j=1}^{k-1}\psi _{n,j}}\right) \) is completely determined by channel conditions, and thus cannot be increased via power allocation. Differently, for the 1st MU, if the CSI at the BS is sufficiently accurate, the SINR \(\gamma _{n,1}\) becomes high. As a result, the constant term 1 in the rate expression is negligible, and thus the average transmission rate can be approximated as

$$\begin{aligned} R_{n,1}\approx & {} \mathrm {E}\left[ \log _2\left( \frac{\alpha _{n,1}|\mathbf {h}_{n,1}^H\mathbf {w}_n|^2P_{n,1}^{S}}{\alpha _{n,1}(1-\rho _{n,1})\sum \nolimits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,1}^H\mathbf {w}_i|^2\sum \nolimits _{l=1}^{K}P_{i,l}^S}\right) \right] \nonumber \\= & {} \underbrace{\mathrm {E}\left[ \log _2\left( \alpha _{n,1}|\mathbf {h}_{n,1}^H\mathbf {w}_n|^2P_{n,1}^{S}\right) \right] }_{\text {Ideal average rate}}\nonumber \\&-\underbrace{\mathrm {E}\left[ \log _2\left( \alpha _{n,1}(1-\rho _{n,1})\sum \limits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,1}^H\mathbf {w}_i|^2\sum \limits _{l=1}^{K}P_{i,l}^S\right) \right] }_{\text {Rate loss due to imperfect CSI}}. \end{aligned}$$
(7.35)

In (7.35), the first term is the ideal average transmission rate with perfect CSI, and the second one is rate loss caused by imperfect CSI. We first check the term of the ideal average transmission rate, which is given by

$$\begin{aligned} R_{n,1}^{\text {ideal}}= & {} \mathrm {E}\left[ \log _2\left( \alpha _{n,1}P_{tot}^{S}\theta _{n,1}|\mathbf {h}_{n,1}^H\mathbf {w}_n|^2\right) \right] \nonumber \\= & {} \log _2\left( \alpha _{n,1}P_{tot}^{S}\theta _{n,1}\right) -\frac{C}{\ln (2)}. \end{aligned}$$
(7.36)

Note that if there is perfect CSI at the BS, the average transmission rate of the 1st MU increases proportionally to \(\log _2(P_{tot}^{S})\) without a bound. However, as seen in (7.34), the \((k>1)\)th MU has an upper bounded rate under the same condition, which reconfirms the claim in Lemma 2 that it is optimal to arrange one MU in each cluster in presence of perfect CSI. Then, we investigate the rate loss due to imperfect CSI, which can be expressed as

$$\begin{aligned} R_{n,1}^{\text {loss}}= & {} \mathrm {E}\bigg [\log _2\bigg (\alpha _{n,1}(1-\rho _{n,1})P_{tot}^S\sum \limits _{i=1,i\ne n}^{N}|\mathbf {e}_{n,1}^H\mathbf {w}_i|^2\sum \limits _{t=1}^{K}\theta _{i,t}\bigg )\bigg ]\nonumber \\= & {} \log _2\left( \alpha _{n,1}(1-\rho _{n,1})P_{tot}^S\right) -\frac{1}{\ln (2)}\sum \limits _{i=1}^{N-1}\varXi _{N-1}\left( i,\{\mu _{n,1}^v\}_{v=1}^{N-1}\right) \left( C-\ln \left( \mu _{n,1}^i\right) \right) , \end{aligned}$$
(7.37)

where

$$ \mu _{n,1}^v=\left\{ \begin{array}{ll} \sum \limits _{l=1}^{K}\theta _{v,l} &{} \text {if}\, v<n\\ \sum \limits _{l=1}^{K}\theta _{v+1,l} &{} \text {if}\, v\ge n \end{array} \right. . $$

Given a \(\rho _{n,1}\), the rate loss \(R_{n,1}^{loss}\) enlarges as the total transmit power \(P_{tot}^S\) increases. In order to keep the same rate of increase to the ideal rate \(R_{n,1}^{ideal}\), the CSI accuracy \(\rho _{n,1}\) should satisfy the following theorem:

Theorem 2

Only when \((1-\rho _{n,1})P_{tot}^S\) is equal to a constant \(\varepsilon \), the average transmission rate of the 1st MU in the nth cluster with imperfect CSI remains a fixed gap with respect to the ideal rate. Specifically, the transmit power for training sequence should satisfy \(P_{n,1}^{p}=\frac{P_{tot}^S/\varepsilon -1}{\alpha _{n,1}\tau }\) in TDD systems, while the number of feedback bits should satisfy \(B_{n,1}=(M-1)\log _2(P_{tot}^S/\varepsilon )\) in FDD systems.

Proof

The proof is intuitively. By substituting \(\rho _{n,1}=1-\frac{1}{1+\tau P_{n,1}^P\alpha _{n,1}}\) into \((1-\rho _{n,1})P_{tot}^S=\varepsilon \) for TDD systems and \(\rho _{n,1}=1-2^{-\frac{B_{n,1}}{M-1}}\) into \((1-\rho _{n,1})P_{tot}^S=\varepsilon \) for FDD systems, we can get \(P_{n,1}^{p}=\frac{P_{tot}^S/\varepsilon -1}{\alpha _{n,1}\tau }\) and \(B_{n,1}=(M-1)\log _2(P_{tot}^S/\varepsilon )\), which proves Theorem 2. \(\square \)

Remark 3

For the CSI accuracy at the BS, \(P_{n,1}^{p}\tau \) (namely transmit energy for training sequence) in TDD systems and \(\frac{B_{n,1}}{M-1}\) (namely spatial resolution) in FDD systems are two crucial factors. Specifically, given a requirement on CSI accuracy, it is possible to shorten the length of training sequence by increasing the transmit power, so as to leave more time for data transmission in a time slot. However, in order to keep the pairwise orthogonality of training sequences, the length of training sequence \(\tau \) must be larger than the number of MUs. In other words, the minimum value of \(\tau \) is NK. Similarly, in FDD systems, it is possible to reduce the feedback bits by increasing the number of antennas M. Yet, in order to fulfill the spatial degrees of freedom for ZFBF at the BS, M must be not smaller than \((N-1)K+1\). This is because the beam \(\mathbf {w}_i\) for the ith cluster should be in the null space of the channels for the \((N-1)K\) MUs in the other \(N-1\) clusters.

Furthermore, substituting (7.36) and (7.37) into (7.35), we have

$$\begin{aligned} R_{n,1}\approx & {} -\log _2(1-\rho _{n,1})+\log _2(\theta _{n,1})-\sum \limits _{i=1}^{N-1}\varXi _{N-1}\left( i,\{\mu _{n,1}^v\}_{v=1}^{N-1}\right) \log _2\left( \mu _{n,1}^i\right) . \end{aligned}$$
(7.38)

Given a power allocation scheme, it is interesting that the bound of \(R_{n,1}\) is independent of channel conditions. As analyzed above, it is possible to improve the average rate by improving the CSI accuracy. Especially, for FDD systems, we have the following lemma:

Lemma 3

At the high power region with a large number of feedback bits, the average rate of the 1st MU increases linearly as the numbers of feedback bits increase.

Proof

Replacing \(\rho _{n,1}\) in (7.38) with \(\rho _{n,1}=1-2^{-\frac{B_{n,1}}{M-1}}\), \(R_{n,1}\) is transformed as

$$\begin{aligned} R_{n,1}\approx & {} \frac{B_{n,1}}{M-1}+\log _2(\theta _{n,1})-\sum \limits _{i=1}^{N-1}\varXi _{N-1}\left( i,\{\mu _{n,1}^v\}_{v=1}^{N-1}\right) \log _2\left( \mu _{n,1}^i\right) , \end{aligned}$$
(7.39)

which yields Lemma 3. \(\square \)

4.2 Noise-Limited Case

If the interference term is negligible with respect to the noise term due to a low transmit power, then the SINR \(\gamma _{n,k}, \forall n, k\) is reduced as

$$\begin{aligned} \gamma _{n,k}=\alpha _{n,k}|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2P_{n,k}^{S}, \end{aligned}$$
(7.40)

which is equivalent to the interference-free case. As discussed earlier, \(|\mathbf {h}_{n,k}^H\mathbf {w}_n|^2\) is \(\chi ^2(2)\) distributed, then the average transmission rate can be computed as

$$\begin{aligned} R_{n,k}= & {} \int _0^{\infty }\log _2\left( 1+P_{n,k}^{S}\alpha _{n,k}x\right) \exp (-x)dx\nonumber \\= & {} -\exp \left( \frac{1}{P_{n,k}^{S}\alpha _{n,k}}\right) E _{i }\left( -\frac{1}{P_{n,k}^{S}\alpha _{n,k}}\right) . \end{aligned}$$
(7.41)

Note that Eq. (7.41) is independent of the CSI accuracy, thus it is unnecessary to carry out channel estimation or CSI feedback in this scenario. Since both intra-cluster interference and inter-cluster interference are negligible, ZFBF at the BS and SIC at the MUs are not required, and all optimization schemes asymptotically approach the same performance.

5 Simulation Results

To evaluate the performance of the proposed multiple-antenna NOMA technology, we present several simulation results under different scenarios. For convenience, we set \(M=6\), \(N=3\), \(K=2\), \(B_{tot}=12\), while \(\alpha _{n,k}\) and \(\rho _{n,k}\) are given in Table 7.1 for all simulation scenarios without extra specification. In addition, we use SNR (in dB) to represent \(10\log _{10}P_{tot}^S\).

Table 7.1 Parameter Table for \((\alpha _{n,k}, \rho _{n,k})\), \(\forall n\in [1,3]\), and \(k\in [1, 2]\)
Fig. 7.2
figure 2

Comparison of theoretical expressions and simulation results

First, we verify the accuracy of the derived theoretical expressions. As seen in Fig. 7.2, the theoretical expressions for both the 1st and the 2nd MUs in the 1st cluster well coincide with the simulation results in the whole SNR region, which confirms the high accuracy. As the principle of NOMA implies, the 1st MU performs better than the second MU. At high SNR, the average rates of the both MUs are asymptotically saturated, which proves Theorem 1 again.

Secondly, we compare the proposed power allocation scheme with the equal power allocation scheme and the fixed power allocation scheme proposed in [5]. Note that the fixed power allocation scheme distributes the power with a fixed ratio 1:4 between the two MUs in a cluster so as to facilitate the SIC. It is found in Fig. 7.3 that the proposed power allocation scheme offers an obvious performance gain over the two baseline schemes, especially in the medium SNR region. Note that practical communication systems, in general, operate at medium SNR, thus the proposed scheme is able to achieve a given performance requirement with a lower SNR. As the SNR increases, the proposed scheme and the equal allocation scheme achieve the same saturated sum rate, but the fixed allocation scheme has a clear performance loss.

Fig. 7.3
figure 3

Performance comparison of different power allocation schemes

Next, we examine the advantage of feedback allocation for the FDD-based NOMA system with equal power allocation, cf. Fig. 7.4. As analyzed in Sect. 7.4.2, at very low SNR, namely the noise-limited case, the average rate is independent of CSI accuracy, and thus the two schemes asymptotically approach the same sum rate. As SNR increases, the proposed feedback allocation scheme achieves a larger performance gain. Similarly, at high SNR, both the two schemes are saturated, and the proposed scheme obtains the largest performance gain. For instance, at SNR \(= 30\) dB, there is a gain of more than 0.5 b/s/Hz. Furthermore, we investigate the impact of the total number of feedback bits on the average rates of different MUs at SNR \(=\) 35 dB. As shown in Fig. 7.5, the performance of the 1st MU is clearly better than that of the 2nd MU. Moreover, the average rate of the 1st MU is nearly a linear function of the number of feedback bits, which reconfirms the claims of Lemma 3.

Fig. 7.4
figure 4

Performance comparison of different feedback allocation schemes

Fig. 7.5
figure 5

Asymptotic performance with a large number of feedback bits

Fig. 7.6
figure 6

Performance comparison of different transmission modes

Then, we investigate the impact of the transmission mode on the performance of the NOMA systems at \(\text {SNR} = 10\) dB with equal power allocation in Fig. 7.6. To concentrate on the impact of transmission mode, we set the same CSI accuracy of all downlink channels as \(\rho \). Note that we consider four fixed transmission modes under the same channel conditions in the case of six MUs in total. Consistent with the claims in Lemma 2, mode 4 with \(N=1\) and \(K=6\) achieves the largest sum rate at low CSI accuracy, while mode 1 with \(N=6\) and \(K=1\) performs best at high CSI accuracy. In addition, it is found that at medium CSI accuracy, mode 2 with \(N=3\) and \(K=2\) is optimal, since it is capable to achieve a best balance between intra-cluster interference and inter-cluster interference. Thus, we propose to dynamically select the transmission mode according to channel conditions and system parameters. As shown by the red line in Fig. 7.6, dynamic mode selection can always obtain the maximum sum rate.

Fig. 7.7
figure 7

Performance comparison of a joint optimization scheme and a fixed allocation scheme

Finally, we exhibit the superiority of the proposed joint optimization scheme for the NOMA systems at \(\text {SNR}=10\) dB. In addition, we take a fixed scheme based on NOMA and a time division multiple access (TDMA) based on OMA as baseline schemes. Specifically, the joint optimization scheme first distributes the transmit power with equal feedback allocation, then allocates the feedback bits based on the distributed power, finally selects the optimal transmission mode. The fixed scheme always adopts the mode 2 (\(N=3, K=2\)) with equal power and feedback allocation. The TDMA equally allocates each time slot to the six MUs and utilizes maximum ratio transmission (MRT) based on the available CSI at the BS to maximize the rate. For clarity of notation, we use \(\rho \) to denote the CSI accuracy based on equal feedback allocation. In other words, the total number of feedback bits is equal to \(B_{tot}=-K*N*(M-1)*\log _2(1-\rho )\). As seen in Fig. 7.7, the fixed scheme performs better than the TDMA scheme at low and high CSI accuracy, and slightly worse at the medium regime. However, the proposed joint optimization scheme performs much better than the two baseline schemes. Especially at high CSI accuracy, the performance gap becomes substantially large. For instance, there is a performance gain of about 3 b/s/Hz at \(\rho =0.8\), and up to more than 5 b/s/Hz at \(\rho =0.9\). As analyzed in Lemma 2 and confirmed by Fig. 7.6, when \(\rho \) is larger than 0.8, which is a common CSI accuracy in practical systems, mode 2 is optimal for maximizing the system performance. Thus, the joint optimization scheme is reduced to joint power and feedback allocation, which requires only a very low complexity. Thus, the proposed NOMA scheme with joint optimization can achieve a good performance with low complexity, and it is a promising technique for future wireless communication systems.

6 Conclusion

This chapter provided a comprehensive solution for designing, analyzing, and optimizing a NOMA technology over a general multiuser multiple-antenna downlink in both TDD and FDD modes. First, we proposed a new framework for multiple-antenna NOMA. Then, we analyzed the performance and derived exactly closed-form expressions for average transmission rates. Afterward, we optimized the three key parameters of multiple-antenna NOMA, i.e., transmit power, feedback bits, and transmission mode. Finally, we conducted asymptotic performance analysis and obtained insights on system performance and design guidelines.