Keywords

Multiple access is an essential physical-layer technique in wireless communication networks that allows multiple mobile users to access the network simultaneously. Driven by the upsurge of devices expected in the fifth generation (5G) and beyond, future wireless communication networks are foreseen to operate in dynamic regimes ranging from underloaded (where the number of scheduled devices is smaller than the number of transmit antennas on each access point) to overloaded (where the number of scheduled devices is larger than the number of transmit antennas on each access point). Besides, each transmitter is required to simultaneously serve devices with heterogeneous capabilities, deployments, as well as qualities of channel state information at the transmitter (CSIT) since the devices for 5G and beyond tend to be more diverse including low-end units such as Internet of Things (IoT) and machine-type communications (MTC)-type devices and high-end equipment such as smartphones with varied user deployments and applications. The resulting requirements for massive connectivity, high throughput, as well as quality of service (QoS) heterogeneity have recently sparked interests in redesigning multiple access techniques for the downlink of communication systems.

This chapter first reviews the state-of-the-art multiple access techniques including their benefits and limitations, followed by introducing the promising multiple access candidate, rate-splitting multiple access (RSMA) for 5G and beyond, and a comprehensive comparison among all multiple access techniques. The challenges and future trends of using RSMA will be summarized in the end.

1 Evolution of Multiple Access Techniques

The past decades have witnessed the development of multiple access techniques brought by the evolution of cellular networks from the first generation (1G) to 5G. From orthogonal multiple access (OMA) to non-orthogonal multiple access (NOMA) and space-division multiple access (SDMA), multiple access techniques have progressed toward serving more users non-orthogonally in each subcarrier due to the scarcity of spectrum. In this section, those existing multiple access techniques are reviewed.

1.1 Orthogonal Multiple Access (OMA)

The 1G wireless communication system introduced in the 1980s employs frequency-division multiple access (FDMA) where the frequency bandwidth is divided into nonoverlapping frequency sub-channels and each user is allocated with an independent sub-channel. It was used to support the original analog voice services. The second generation (2G) is developed in the 1990s to further enhance the voice service quality as well as to enable short messaging service. The 2G standard system, Global System for Mobile Communications (GSM), adopts time-division multiple access (TDMA) where the frequency domain is shared by all users, while the time domain is divided into different time slots and occupied by independent users. The third generation (3G) introduced in the 2000s opens the new dimension of code to design multiple access where code-division multiple access (CDMA) is commercially applied to support TV streaming, mobile video calls, and so on. Different from FDMA and TDMA, CDMA enables the simultaneous transmission for multiple users through the same sub-channels by employing the spread spectrum technology to avoid inter-user interference. In 2009, the fourth generation (4G) based on the long-term evolution (LTE) standard is developed to meet the increasing user demand for more sophisticated mobile devices. By employing orthogonal frequency-division multiple access (OFDMA) as the standard multiple access technique, the time and frequency resources are further divided into narrow time slots and subcarriers, respectively. The resource blocks formed by the divided time–frequency grids are allocated to the users dynamically. Compared with FDMA, TDMA, and CDMA, OFDMA is more robust and achieves a higher spectral efficiency. The robustness comes from its ability of combatting narrowband co-channel interference and multipath fading by scheduling users over orthogonal subcarriers, while the spectral efficiency comes from its ability of multiplexing users with low data rate into a wider channel with adaptive transmission rate for each user. All of the aforementioned multiple access techniques are categorized into orthogonal multiple access (OMA) where users are scheduled in orthogonal dimensions.

1.2 Space-Division Multiple Access (SDMA)

Driven by the increasing user demands, access points nowadays are commonly equipped with multiple antennas. The arisen multiple-input multiple-output (MIMO) systems have been widely used in modern wireless standards, including mobile Worldwide Interoperability for Microwave Access (WiMAX) systems, 4G LTE standard, IEEE 802.11n, and so on. The spatial dimension introduced by MIMO systems opens the door to space-division multiple access (SDMA). By utilizing the spatial dimension to separate users, SDMA allows multiple users to be served simultaneously in the same time–frequency resources.

The only strategy that achieves the capacity region of the multiple-input single-output (MISO)/MIMO (Gaussian) broadcast channel (BC) with perfect CSIT is the complex dirty paper coding (DPC) [1], in which the transmitter relies on perfect CSIT to encode the user messages and perform enhanced interference cancellation such that the encoded data stream experiences no interference from previously encoded streams. However, due to the high computational burden of implementing DPC in practice, linear precoding at the transmitter is more practical and attractive since it simplifies the transmitter design [2]. SDMA is therefore commonly implemented using multi-user linear precoding (MU–LP) either in closed-form beamforming or optimized beamforming using optimization tools. Though the beamformer might be suboptimal, SDMA based on MU–LP is shown to be useful especially when users experience semi-orthogonal channels and relatively similar channel strength or long-term signal-to-noise ratio (SNR) [3]. Hence, it is well-acknowledged and becomes the fundamental multiple access of various 4G and 5G techniques such as multi-user MIMO (MU-MIMO), massive MIMO, network MIMO, millimeter-wave MIMO, and coordinated multipoint (CoMP). Figure 3.1 illustrates the system model of K-user SDMA based on MU–LP for MISO BC. The messages W 1, …, W K intended for K users are independently encoded into data streams s 1, …, s K and superimposed at the transmitter after linear precoding. Each user directly decodes its intended stream by treating interference from streams for all other users as noise.

Fig. 3.1
figure 1

Transmission model of K-user SDMA based on MU–LP

The main benefit of SDMA is its capability of achieving all spatial multiplexing gains of MISO BC with perfect CSIT. With the use of MU–LP, the precoder and receiver complexity remains low. However, there are three major limitations of SDMA based on MU–LP, which are summarized as follows:

  • It is only suited to the underloaded regime, and its performance drops dramatically in the overloaded regime since MU–LP requires more transmit antennas than the number of users in MISO BC so as to generate orthogonal beams to manage multiuser interference efficiently. The current approach to deal with overloaded scenarios at the transmitter is to divide users into groups and schedule user groups over orthogonal resources (e.g., time/frequency). Users in the same group are served by MU–LP. However, such approach may increase latency and decrease QoS.

  • The performance of SDMA based on MU–LP is sensitive to the user deployments. It is only suited when users have semi-orthogonal channels with similar channel strengths. Though there exists low-complex scheduling and user pairing algorithms to pair users with semi-orthogonal channels [2], the complexity of the scheduler increases rapidly when considering the optimal scheduling with an exhaustive search.

  • Though SDMA based on MU–LP achieves the optimal degrees of freedomFootnote 1 (DoF) in MISO BC with perfect CSIT [4], there is a significant DoF and performance loss when CSIT becomes imperfect [5]. As SDMA design is motivated by perfect CSIT, the direct application of SDMA in the presence of imperfect CSIT results in additional interference coming from the imperfect linear precoder design [4].

1.3 Non-orthogonal Multiple Access (NOMA)

With the aim of further boosting the system spectral efficiency, non-orthogonal multiple access (NOMA) that superposes users in the same time–frequency resources via the power domain or code domain is introduced. Specifically, NOMA can be categorized into power-domain NOMA (e.g., [6]) and code-domain NOMA (e.g., sparse code multiple access (SCMA) [7]). In this chapter, we focus on power-domain NOMAFootnote 2 that relies on superposition coding (SC) at the transmitter and successive interference cancellation (SIC) at the receiver [6, 8,9,10] (which is also denoted in short as SC–SIC). The study of NOMA starts from single-input single-output (SISO) (Gaussian) BC and is further extended to multi-antenna BC. In this chapter, we denote NOMA in SISO BC as single-antenna NOMA while NOMA in MISO/MIMO BC as multi-antenna NOMA.

1.3.1 Single-Antenna NOMA

The study of single-antenna NOMA is inspired by the well-known result in the literature of information theory that SC–SIC is the capacity-achieving technique for SISO BC [11, 12]. Comparing NOMA and OMA, it is well-known that when there are certain channel strength disparities among users, the capacity region of SISO BC is achieved by NOMA, and it is larger than the rate region achieved by OMA (e.g., TDMA) [12]. However, when users experience the same channel strengths, the advantage of NOMA vanishes, and OMA is sufficient to achieve the capacity region [12].

The major benefit of single-antenna NOMA is its ability to improve the spectral efficiency in an overloaded regime by allowing multiple users (that experience different channel strengths or path losses) to be served by one transmitter with single transmit antenna on the same time–frequency resource. However, its limitation is non-negligible. For a K-user SISO BC, K − 1 layers of SIC are required at the user with strongest channel strength to sequentially decode the K − 1 streams of all other co-scheduled users before decoding its intended stream. As the number of user increases, the receiver complexity and likelihood of error propagation increase significantly. A practical system requires the number of SIC layers to be small. One method is to divide the users into small groups, apply SC–SIC in each group, and schedule groups over orthogonal resources by using OMA, which, however, would lead to some performance loss and latency issue.

1.3.2 Multi-antenna NOMA

Motivated by the benefits of SC–SIC in SISO BC, NOMA has been further applied to multi-antenna BC. There are two main strategies of multi-antenna NOMA, both of which rely on linearly precoded SC–SIC.

The first strategy, which is simply denoted as “SC–SIC,” is a direct application of SC–SIC to MISO/MIMO BC [13,14,15,16]. However, contrary to SISO BC, multi-antenna BC is non-degraded, i.e., users cannot be ordered according to their channel strengths in general settings. SC–SIC degrades multi-antenna BC by ordering users based on their effective scalar channels obtained at the transmitter after linear precoding. Users with stronger effective channel strengths are required to decode and remove the streams of users with weaker effective channel strengths in a successive manner. Such strategy forces the multi-antenna non-degraded channel into an effective single-antenna degraded channel since the user with the strongest channel strength is required to decode the messages of all other users. SC–SIC wastes all spatial multiplexing gains in MISO/MIMO BC and is only able to cope with the scenarios when user channels are aligned with certain channel strength disparities among them. From the DoF perspective, the sum-DoF achieved by SC–SIC is 1 since one receiver has to decode all streams [17]. It is equal to the DoF achieved by OMA or single-user beamforming. Compared with the sum-DoF \(\min \{N_t, K\}\) (where N t is the number of transmit antennas and K is the number of users in MISO BC) achieved by DPC and MU–LP in a MISO BC with perfect CSIT, SC–SIC in multi-antenna BC results in a significant DoF loss, and such loss comes with a dramatic burden to receivers due to the use of SIC layers. In contrast, MU–LP does not require any SIC at receiver sides, and it achieves a higher spatial multiplexing gain which drives the use of MU–MIMO in 4G [18]. To compensate the DoF loss of SC–SIC, one natural method is to consider dynamic switching between NOMA and SDMA based on the channel states [19]. Figure 3.2 illustrates the transmission model of a three-user SC–SIC with decoding order from user-1 to user-3. Hence, user-3 is required to decode all the three streams.

Fig. 3.2
figure 2

Transmission model of three-user SC–SIC

The second strategy, denoted as “SC–SIC per group,” divides users into disparate groups with users in the same group being served by SC–SIC and users across the groups being served by SDMA in order to coordinate inter-group interference [6, 20,21,22,23,24]. By combining SDMA and NOMA in SC–SIC per group, multi-antenna BC is decomposed into non-interfering single-antenna NOMA channels, and the DoF loss of SC–SIC can be recovered. However, it is only suited to an overloaded regime, and users within the same group require almost aligned channels, while users in different groups require (semi-)orthogonal channels. Figure 3.3 illustrates the transmission model of four-user SC–SIC per group with user-1 and user-2 in group 1 while user-3 and user-4 in group 2. The inner-group interference is decoded based on SC–SIC, while the inter-group interference is treated as interference based on MU–LP. By assuming the decoding order in group 1 is from the message of user-1 to that of user-2, user-2 is required to decode the messages of both user-1 and user-2 while fully treating the inter-group interference from user-3 and user-4 as noise. Similarly in group-2, the decoding order from user-3 to user-4 is assumed.

Fig. 3.3
figure 3

Transmission model of four-user SC–SIC per group

Multi-antenna NOMA also relies on perfect CSIT as SDMA. When CSIT becomes imperfect, extra multiuser interference is introduced for both SC–SIC and SC–SIC per group strategies [16]. Similarly to single-antenna NOMA, the major benefit of multi-antenna NOMA is its ability to deal with an overloaded regime with aligned user channels and channel strength disparities. The limitations of multi-antenna NOMA are summarized as follows:

  • The DoF loss of multi-antenna NOMA is severe. The fundamental reason that SC–SIC achieves the capacity region of SISO BC is due to the fact that users can be ordered based on channel strengths in such a degraded BC. However, striving to using SC–SIC in non-degraded MISO/MIMO BC degrades multi-antenna BC and results in a waste of spatial resources. Hence, there is an unavoidable DoF loss of SC–SIC in multi-antenna NOMA.

  • Multi-antenna NOMA is only suited for specific user deployments when user channels are aligned with a disparity of channel strengths. It is not suited for general settings.

  • There is a complexity increase at both the transmitter and the receivers in multi-antenna NOMA. At each receiver, multiple layers of SIC are required to decode and remove the interference from other users. At the transmitter equipped with multiple antennas, the optimization of user grouping, decoding orders, and precoders are coupled since the effective user channels are influenced by the precoders. For example, considering a three-user MISO BC, SC–SIC requires to jointly optimize the precoding vectors of three users and six possible decoding orders, while SC–SIC per group requires the user ordering and grouping to be jointly optimized with precoders. One commonly adopted method to reduce the complexity at the transmitter in multi-antenna NOMA is to assign the same precoding vector to all users within the same group [6], which, however, would further deteriorate system performance as the overall searching space for optimal precoders is reduced.

  • As multi-antenna NOMA is motivated in the presence of perfect CSIT, it is also sensitive to the CSIT inaccuracy as SDMA.

Based on the discussion in Sects. 3.1.2 and 3.1.3, we conclude that SDMA and NOMA are actually two extreme interference management strategies in MISO/MIMO BC where users in NOMA try to fully decode and remove interference created by other users, while users in SDMA always fully treat any residual multiuser interference as noise. Moving toward imperfect CSIT, residual inter-user interference is introduced for both SDMA and NOMA (SC–SIC per group).

2 Rate-Splitting Multiple Access (RSMA) for 5G and Beyond

Rate-splitting multiple access (RSMA), based on linearly precoded rate-splitting (RS) at the transmitter and SIC at the receivers, is a more general and powerful multiple access for downlink multi-antenna systems that contains SDMA, NOMA, and OMA as special cases. Apart from SDMA that fully treats interference as noise and NOMA that fully decodes interference, RSMA achieves a more dynamic interference management where the interference is partially decoded and partially treated as noise at each user [4]. At the transmitter that supports RSMA, user messages are split into common and private parts; the common messages are combined and encoded into common streams to be decoded by multiple users, while private messages are independently encoded into private streams to be decoded by the corresponding users. All streams are superimposed at the transmitter and broadcast to the users. Each user relies on layers of SIC to decode the common streams before decoding the intended private stream. By adjusting the power allocation for the common and private streams as well as the message split, RSMA automatically bridges SDMA and NOMA that solely rely on the two extreme interference management strategies or a combination thereof.

2.1 Literature Review

The fundamental building block of RSMA is RS technique. The previous study of RS can be categorized into communication and information theory categories. Both are summarized comprehensively in this section.

The information theoretic works on RS are summarized in Table 3.1. The idea of RS is not new. It dates back to Carleial’s work and the Han and Kobayashi (HK) scheme in 1980s for the two-user SISO interference channel (IC) [25]. Such scheme is further proved in [26] to achieve rate regions within 1 bit/s/Hz of the capacity region. The terminology RSMA is first introduced in [27] for the SISO multiple access channel (MAC), where RS based on successive single-user decoding and interference cancellation has been shown to achieve the capacity region of the K-user Gaussian MAC. However, the uplink RSMA has fundamentally different motivations and structures than the downlink RSMA we considered in this chapter. The use of RS as the building block of RSMA framework is motivated by recent progresses on the fundamental limits of a multi-antenna BC and IC characterized by RS. In contrast with the conventional RS used for MAC or two-user SISO IC, the RSMA technique we introduced here is in a different setup, namely, (1) in a BC and (2) with multiple transmit (and receive) antennas. Note that the study of RS in the multi-antenna BC in both information-theoretical and communication perspectives was initiated a few years ago. In comparison, research on NOMA based on SC–SIC in a BC already appeared for several decades [11, 12]. Up to now, the capacity region of the K-user MISO BC with imperfect CSIT remains an open issue. Instead, attention has been switched to characterizing its DoF region. Surprisingly, the information theoretic upperbound on the sum-DoF of the K-user underloaded MISO BC with imperfect CSIT derived in [28] has been shown to coincides with the sum-DoF achieved by linearly precoded RS with SIC [29, 30]. It is further proved in [31] that RS achieves the entire DoF region of the underloaded MISO BC with imperfect CSIT. In comparison, the sum-DoFs achieved by SDMA based on MU–LP and multi-antenna NOMA are suboptimum. The DoF benefits of RS in imperfect CSIT have also been studied in the underloaded MISO IC [32] and underloaded MIMO IC/BC [33]. The optimum symmetric DoF (also known as max–min DoF) of RS has been studied in [34] for the underloaded MISO BC with imperfect CSIT, where RS achieves a higher symmetric DoF over that of SDMA based on MU–LP. Moving toward the overloaded scenario, the power-partitioning approach that superimposes degraded symbols for no-CSIT users on top of linearly precoded RS symbols for partial-CSIT users has been shown to achieve the entire DoF region of the K-user overloaded MISO BC with imperfect CSIT with heterogeneous CSIT qualities. When CSIT is perfect, the symmetric DoF achieved by RS has been shown to outperform that of SDMA based on MU–LP and NOMA based on SC–SIC in [17] for the K-user overloaded MISO BC with perfect CSIT. To further capture the diversity of channel strengths among users, the generalized DoF (GDoF) has been introduced [26]. The GDoF region of a two-user underloaded MISO BC with imperfect CSIT has been studied in [35, 36] where RS is considered as part of the interference enhancement scheme to achieve the entire GDoF region.

Table 3.1 Summary of information theoretic literature on rate-splitting

The DoF and GDoF superiority of RSMA over SDMA based on MU–LP and NOMA discovered in the information theoretic literature motivates its recent study at the finite SNR regime for rate enhancement in the practical wireless communication systems. The communication literature on RSMA is summarized in Table 3.2. The DoF improvement of RSMA over SDMA in imperfect CSIT has been reflected in the rate performance at the finite SNR regime according to recent studies [17, 30, 34, 37, 38]. In the presence of quantized feedback, RS reduces CSIT feedback overhead compared to MU–LP when using random beamforming for the common stream and zero-forcing beamforming (ZFBF) for the private streams [38]. It is further shown that with optimized precoders, RS outperforms MU–LP in the underloaded MISO BC with imperfect CSIT for the ergodic sum rate maximization [30] and the worst-case rate optimization (max–min rate) [34]. When considering the overloaded scenario, RS with power-partitioning strategy has been shown to outperform its time-partitioning counterpart at finite SNR in the overloaded MISO BC with heterogeneous CSIT [37] with low-complex maximum ratio transmission (MRT) or matched filtering beamforming scheme for the common stream and regularized zero-forcing (RZF) beamforming for the private streams. The 2-layer hierarchical RS (HRS) that relies on multiple common messages decoded by different groups of users is proposed in [39] for massive MIMO. Furthermore, the generalized RS scheme of RSMA that embraces 1-layer RS and 2-layer HRS as subcases is proposed in [42] for MISO BC with perfect CSIT, where RSMA shows clear rate region and weighted sum rate (WSR) improvement over SDMA and NOMA. The comparison among SDMA, NOMA and RSMA is further analyzed in the two-user case with optimized precoders to maximize energy efficiency [45] or with low-complex precoding but optimal power allocation for common part and private parts of the user messages [58]. Besides the above studies of RSMA in MISO BC, the transceiver design of RS has been studied in other applications of multi-antenna BC, such as MISO BC with hardware impairment [40], multigroup multicast [40], millimeter-wave (mmWave) systems [41], multi-pair relaying [47], cooperative multicell MISO BC [49], cloud-radio access networks (C-RAN) [52, 53], unmanned aerial vehicle (UAV)-assisted networks [50, 51], simultaneous wireless information and power transfer (SWIPT) [54], cooperative user relaying networks [56, 61], non-orthogonal unicast and multicast [57], multi-carrier systems [59], and so on. All of the above works consider linearly precoded RS at the transmitter, and nonlinear precoder design of RS has been studied as Tomlinson–Harashima Precoded RS (THPRS) [46] and dirty paper coded RS (DPCRS) [62]. Moving toward MIMO BC, different linear combining techniques are studied in [60] with minimum mean-square error (MMSE) combiner showing the best performance.

Table 3.2 Summary of communication literature on rate-splitting

2.2 RSMA Framework

RSMA is a generalized multiple access technique for exploring a larger rate region and the room of QoS enhancement. In the framework of RSMA, there are three commonly studied schemes in the literature, namely, 1-layer RS, 2-layer HRS, and the generalized RS, which are all specified in this section.

2.2.1 1-Layer RS

1-layer RS is the simplest RSMA scheme and it is the building block of the entire RSMA framework. It has been widely studied in the literature of RS in multi-antenna BC and its applications [17, 30, 34, 37, 38, 40,41,42, 45, 47, 50, 54, 56,57,58,59, 61] with both perfect and imperfect CSIT. Figure 3.4 illustrates the transmission model of K-user 1-layer RS with one base station (BS) equipped with N t transmit antennas simultaneously serving K single-antenna users. The users are indexed by \(\mathcal {K}=\{1,\cdots ,K\}\).

Fig. 3.4
figure 4

Transmission model of K-user 1-layer RS

At the transmitter, the K messages W 1, ⋯ , W K intended for the K users are passed to the message splitter. The message of each user \(W_k, k\in \mathcal {K}\) is split into one common part W c,k and one private part W p,k.Footnote 3 The common parts W c,1, ⋯ , W c,K are combined into the common message W c and encoded into the common stream s c to be decoded by all users. The private parts W p,1, ⋯ , W p,K are independently encoded into K private streams s 1, ⋯ , s K to be decoded by the corresponding users only. The encoded stream vector \(\mathbf {{s}}=[ s_{c},s_{1},\ldots , s_{K}]^{T}\in \mathbb {C}^{(K+1)\times 1}\) is linearly precoded via precoding matrix \(\mathbf {{P}}=[ \mathbf {{p}}_{c}, \mathbf {{p}}_{1},\ldots , \mathbf {{p}}_{K}]\in \mathbb {C}^{N_t\times (K+1)}\) with \(\mathbf {{p}}_{k}\in \mathbb {C}^{N_t\times 1}, k\in \{c\}\cup \mathcal {K}\). The resulting transmit signal is

$$\displaystyle \begin{aligned} \mathbf{{x}}=\mathbf{{P}}\mathbf{{s}}=\mathbf{{p}}_{c}s_{c}+\sum_{k\in\mathcal{K}}\mathbf{{p}}_{k}s_{k}. \end{aligned} $$
(3.1)

At user sides, the signal received at each user is

$$\displaystyle \begin{aligned} y_k&=\mathbf{{h}}_{k}^{H}\mathbf{x}+n_k\\ &=\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{c}s_{c}+\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{k}s_{k}+\sum_{j\in\mathcal{K},j\neq k}\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{j}s_{j}+n_k, \end{aligned} $$
(3.2)

where \(\mathbf {{h}}_{k}\in \mathbb {C}^{N_t\times 1}\) is the channel between the BS and user-k. It may be perfectly known at the transmitter [42, 45, 49, 50, 53, 54, 56,57,58,59, 61] or partially known at the transmitter [17, 30, 34, 37,38,39,40,41, 46, 47, 57, 62] due to the quantization error or feedback delay. n k is the additive white Gaussian noise (AWGN) at user-k that follows the distribution \(\mathcal {C}\mathcal {N}(0,\sigma _{n,k}^2)\).

Each user firstly decodes the data stream s c by treating the interference from all private streams as noise.Footnote 4 The signal-to-interference-pulse-noise ratio (SINR) of decoding the common stream s c at user-k is

$$\displaystyle \begin{aligned} \gamma_{c,k}=\frac{\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{c}\right|{}^{2}}{\sum_{j\in\mathcal{K}}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{j}\right|{}^{2}+\sigma_{n,k}^2}. \end{aligned} $$
(3.3)

To ensure all users can successfully decode the common stream, its achievable rate should not exceed

$$\displaystyle \begin{aligned} R_{c}=\min\left\{\log_{2}\left(1+\gamma_{c,k}\right),\cdots,\log_{2}\left(1+\gamma_{c,K}\right)\right\}. \end{aligned} $$
(3.4)

Note that R c is shared by all K users. Denote C k as the part of rate allocated to user-k for the transmission of W c,k, we have

$$\displaystyle \begin{aligned} \sum_{k\in\mathcal{K}}C_k=R_{c}. \end{aligned} $$
(3.5)

Once s c is successfully decoded, it is re-encoded, precoded, and subtracted from y k. Each user then decodes its intended private stream s k by treating the interference from the private streams of other users as noise.Footnote 5 The SINR of decoding the private stream s k at user-k is

$$\displaystyle \begin{aligned} \gamma_{k}=\frac{\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{k}\right|{}^{2}}{\sum_{j\in\mathcal{K},j\neq k}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{j}\right|{}^{2}+\sigma_{n,k}^2}. \end{aligned} $$
(3.6)

Its corresponding private rate is \( R_{k}=\log _{2}\left (1+\gamma _{k}\right ) \). Hence, the total achievable rate of user-\(k, k\in \mathcal {K}\) is

$$\displaystyle \begin{aligned} R_{k,tot}=C_{k}+R_{k}. \end{aligned} $$
(3.7)

Following the above-described structure of 1-layer RS, we can design the precoders p c, p 1, ⋯ , p K with different objectives, such as maximizing the WSR (or sum rate) [30, 42], maximizing the worst-case user rate [34], maximizing EE [45], minimizing transmit power [44], etc.

2.2.2 2-Layer HRS

2-layer HRS is originally introduced for massive MIMO networks [39] with the aim of enhancing the achievable rate of all users and reducing the CSI feedback. In a K-user 2-layer HRS network, the K users are divided into G separated groups indexed by \(\mathcal {G}=\{1,\ldots ,G\}\) with \(\mathcal {K}_g,g\in \mathcal {G}\) users in each group such that \(\bigcup _{g\in \mathcal {G}}\mathcal {K}_g=\mathcal {K}\). Different from 1-layer RS where the message of each user is only split into two parts, each user in 2-layer HRS splits its message \(W_k, k\in \mathcal {K}_g\) into three different parts \(W_k^{\mathcal {K}}, W_k^{\mathcal {K}_g}, W_k^k\) in order to form outer-group common message and inner-group common message. The outer-group common messages \(\{W_k^{\mathcal {K}}|k\in \mathcal {K}\}\) of all users are jointly combined into one common message \(W_{\mathcal {K}}\) and encoded into the outer-group common stream \(s_{\mathcal {K}}\) to be decoded by all users. The inner-group common messages \(\{W_k^{\mathcal {K}_g}|k\in \mathcal {K}_g\}\) of users in group-g are jointly combined into the common message \(W_{\mathcal {K}_g}\) and encoded into the inner-group common stream \(s_{\mathcal {K}_g}\) to be decoded by all users in \(\mathcal {K}_g\). The private messages \(\{W_k^{k}|k\in \mathcal {K}\}\) are independently encoded into the private streams s 1, ⋯ , s K for the corresponding users only. The encoded streams \(\mathbf {s}=[s_{\mathcal {K}},s_{\mathcal {K}_1},\cdots ,s_{\mathcal {K}_G},s_{1},\cdots ,s_K]^T\in \mathbb {C}^{(K+G+1)\times 1}\) are linearly precoded via precoding matrix \(\mathbf {{P}}=[ \mathbf {{p}}_{\mathcal {K}}, \mathbf {{p}}_{\mathcal {K}_1},\ldots ,\mathbf {{p}}_{\mathcal {K}_G},\mathbf {{p}}_{1},\ldots , \mathbf {{p}}_{K}]\in \mathbb {C}^{N_t\times (K+G+1)}\), the shaped transmit signal is

$$\displaystyle \begin{aligned} \mathbf{{x}}=\mathbf{{P}}\mathbf{{s}}=\mathbf{{p}}_{\mathcal{K}}s_{\mathcal{K}}+\sum_{g\in\mathcal{G}}\mathbf{{p}}_{\mathcal{K}_g}s_{\mathcal{K}_g}+\sum_{k\in\mathcal{K}}\mathbf{{p}}_{k}s_{k}. \end{aligned} $$
(3.8)

At user sides, once each user receives the signal as \(y_k=\mathbf {{h}}_{k}^{H}\mathbf {x}+n_k\), it employs two layers of SIC to successfully decode \(s_{\mathcal {K}}\), \(s_{\mathcal {K}_g}\) and \(s_{k}, k\in \mathcal {K}_g\). The outer-group common stream \(s_{\mathcal {K}}\) is decoded first at all users by treating the interference from all other streams as noise. The corresponding SINR of decoding \(s_{\mathcal {K}}\) at user-k is

$$\displaystyle \begin{aligned} \gamma_{k}^{\mathcal{K}}=\frac{\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\mathcal{K}}\right|{}^{2}}{\sum_{g\in\mathcal{G}}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\mathcal{K}_g}\right|{}^{2}+\sum_{j\in\mathcal{K}}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{j}\right|{}^{2}+\sigma_{n,k}^2}. \end{aligned} $$
(3.9)

Once \(s_{\mathcal {K}}\) is successfully decoded with its contributed part removed from the received signal, each user then decodes the inner-group common stream \(s_{\mathcal {K}_g}\) by treating interference from other inner-group common streams and private streams as noise. The SINR of decoding \(s_{\mathcal {K}_g}\) at user-k is

$$\displaystyle \begin{aligned} \gamma_{k}^{\mathcal{K}_g}=\frac{\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\mathcal{K}_g}\right|{}^{2}}{\sum_{g'\in\mathcal{G},g'\neq g}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\mathcal{K}_{g'}}\right|{}^{2}+\sum_{j\in\mathcal{K}}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{j}\right|{}^{2}+\sigma_{n,k}^2}. \end{aligned} $$
(3.10)

After removing \(s_{\mathcal {K}_g}\) from the received signal, user-k decodes its private stream s k. The SINR of decoding the private stream s k at user-k is

$$\displaystyle \begin{aligned} \gamma_{k}=\frac{\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{k}\right|{}^{2}}{\sum_{g'\in\mathcal{G},g'\neq g}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\mathcal{K}_{g'}}\right|{}^{2}+\sum_{j\in\mathcal{K},j\neq k}\left|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{j}\right|{}^{2}+\sigma_{n,k}^2}. \end{aligned} $$
(3.11)

Following (3.4) and (3.5), we obtain the respective achievable rate of \(s_{\mathcal {K}}\), \(s_{\mathcal {K}_g}\), and s k, which are given by

$$\displaystyle \begin{aligned} \sum_{k\in\mathcal{K}}C_k^{\mathcal{K}}&=\min\left\{\log_{2}\left(1+\gamma_{k'}^{\mathcal{K}}\right)|k'\in \mathcal{K}\right\},\\ \sum_{k\in\mathcal{K}_g}C_k^{\mathcal{K}_g}&=\min\left\{\log_{2}\left(1+\gamma_{k'}^{\mathcal{K}_g}\right)|k'\in \mathcal{K}_g\right\}, \forall g\in\mathcal{G}\\ R_k&=\log_{2}\left(1+\gamma_{k}\right), \forall k\in\mathcal{K}. \end{aligned} $$
(3.12)

where \(C_k^{\mathcal {K}}\) and \(C_k^{\mathcal {K}_g}\) are the parts of the rate allocated to user-k for the transmission of messages \(W_k^{\mathcal {K}}\) and \(W_k^{\mathcal {K}_g}\), respectively. The total achievable rate of user-\(k, k \in \mathcal {K}_g\) is

$$\displaystyle \begin{aligned} R_{k,tot}=C_{k}^{\mathcal{K}}+C_{k}^{\mathcal{K}_g}+R_{k}. \end{aligned} $$
(3.13)

Figure 3.5 illustrates the transmission model of four-user 2-layer HRS with one BS equipped with N t transmit antennas simultaneously serving four single-antenna users. There are two user groups with user-1 and user-2 in group 1 and user-3 and user-4 in group 2. s 1234 is an outer-group common stream to be decoded by all the four users, while s 12 and s 34 are the two inner-group common streams to be decoded by the users within the corresponding groups only. The receiver structures of user-2 and user-4 follow that of user-1 and user-2, respectively.

Fig. 3.5
figure 5

Transmission model of four-user 2-layer HRS

2.2.3 Generalized RS

The generalized RS framework is proposed in [42] with the aim of identifying the largest room for rate and QoS enhancement at the expense of more layers of SIC at each user. In the K-user generalized RS framework, the number of message splits of each user increases with K so as to form common streams intended for different user subsets of \(\mathcal {K}\). For any user subset \(\mathcal {A}\subseteq \mathcal {K}\), the BS transmits a data stream \(s_{\mathcal {A}}\) by loading messages of all users in the subset \(\mathcal {A}\), and \(s_{\mathcal {A}}\) needs to be decoded by all users in the subset \(\mathcal {A}\) while treated as noise by other users. The message of user-k is split into 2K−1 parts as \(\{ W_k^{\mathcal {A}'} | \mathcal {A}' \subseteq \mathcal {K}, k \in \mathcal {A}' \}\). User messages \(\{W_{k'}^{\mathcal {A}}|k'\in \mathcal {A}\}\) with the same superscript \(\mathcal {A}\) are encoded together into the stream \(s_{\mathcal {A}}\).

The concept of stream order is introduced here to simplify the explanation. We define the streams to be decoded by l users as l-order streams. Hence, the common stream \(s_{\mathcal {K}}\) intended for all users is a K-order stream, while the private stream s k is a 1-order stream since it is only decoded by a single user. In the K-user case, all l-order streams form the stream set \(\{s_{\mathcal {A}'}|\mathcal {A}'\subseteq \mathcal {K},|\mathcal {A}'|=l\}\), and there are in total \({K\choose l}\) elements within the set. Specifically, there is one K-order stream \(s_{\mathcal {K}}\) and K 1-order streams s 1, ⋯ , s K. We further introduce l-order data stream vector formed by all l-order streams as \({\mathbf {s}}_l\in \mathbb {C}^{{K\choose l}\times 1}\). Note that when l = K, there is one element within the set, and therefore, s K reduces to \(s_{\mathcal {K}}\). s l is then linearly precoded via the precoding matrix P l formed by \( \{{\mathbf {p}}_{\mathcal {A}'}|\mathcal {A}'\subseteq \mathcal {K},|\mathcal {A}'|=l\}\), and the resulting transmit signal is

$$\displaystyle \begin{aligned} \mathbf{{x}}=\sum_{l=1}^{K}\mathbf{{P}}_{{l}}\mathbf{{s}}_{{l}}=\sum_{l=1}^{K}\sum_{\mathcal{A}'\subseteq\mathcal{K},|\mathcal{A}'|=l}\mathbf{{p}}_{\mathcal{A}'}{{s}}_{\mathcal{A}'}. \end{aligned} $$
(3.14)

At user sides, each user requires 2K−1 − 1 layers of SIC to sequentially decode all the intended common streams. The decoding process starts from the K-order stream and then goes down to the 1-order private stream. Note that each user is involved in multiple l-order streams except the 1-order and K-order streams, and the set of l-order streams to be decoded at user-k is \(\mathcal {S}_{l,k}=\{s_{\mathcal {A}'}|\mathcal {A}'\subseteq \mathcal {K},|\mathcal {A}'|=l,k\in \mathcal {A}'\}\). We denote the decoding order of the l-order streams s l at all users as π l. Based on one certain decoding order π l, we obtain the l-order stream vector to be decoded at user-k as \({\mathbf {s}}_{\pi _{l,k}}=[s_{\pi _{l,k}{(1)}},\cdots ,s_{\pi _{l,k}{(|\mathcal {S}_{l,k}|)}}]^H\), where we assume \(s_{\pi _{l,k}{(i)}}\) is decoded before \(s_{\pi _{l,k}{(j)}}\) if i < j. The SINR of user-k to decode the l-order stream \({s}_{\pi _{l,k}{(i)}}\) is

$$\displaystyle \begin{aligned} \gamma_{k}^{\pi_{l,k}{(i)}}=\frac{|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\pi_{l,k}{(i)}}|{}^{2}}{I_{\pi_{l,k}{(i)}}+\sigma_{n,k}^2}, \end{aligned} $$
(3.15)

where

$$\displaystyle \begin{aligned} I_{\pi_{l,k}{(i)}}&=\sum_{j>i}|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\pi_{l,k}(j)}|{}^{2}+\sum_{l'=1}^{l-1}\sum_{j=1}^{|\mathcal{S}_{l',k}|}|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{\pi_{l',k}(j)}|{}^{2} +\sum_{\mathcal{A}'\subseteq\mathcal{K},k\notin\mathcal{A}'}|\mathbf{{h}}_{k}^{H}\mathbf{{p}}_{{\mathcal{A}'}}|{}^{2} \end{aligned}$$

is the interference received at user-k when decoding \({s}_{\pi _{l,k}{(i)}}\). The first term \(\sum _{j>i}|\mathbf {{h}}_{k}^{H}\mathbf {{p}}_{\pi _{l,k}(j)}|{ }^{2}\) is the interference from the remaining non-decoded l-order streams in \({\mathbf {s}}_{{\pi _{l,k}}}\). The second term \(\sum _{l'=1}^{l-1}\sum _{j=1}^{|\mathcal {S}_{l',k}|}|\mathbf {{h}}_{k}^{H}\mathbf {{p}}_{\pi _{l',k}(j)}|{ }^{2}\) is the interference from lower-order streams \(\{{\mathbf {s}}_{{\pi _{l',k}}}| l'<l\}\) to be decoded at user-k, while the third term \(\sum _{\mathcal {A}'\subseteq \mathcal {K},k\notin \mathcal {A}'}|\mathbf {{h}}_{k}^{H}\mathbf {{p}}_{{\mathcal {A}'}}|{ }^{2}\) is the interference received from the streams that are not intended for user-k. The corresponding achievable rate of user-k for the data stream \({s}_{\pi _{l,k}{(i)}}\) is \( R_k^{\pi _{l,k}{(i)}}=\log _{2}(1+\gamma _{k}^{\pi _{l,k}{(i)}}). \) Following (3.4), (3.5), and (3.12), the achievable rate of the \(|\mathcal {A}|\)-order stream \(s_{\mathcal {A}}\) ( \(\mathcal {A}\in \mathcal {K},2\leq |\mathcal {A}|\leq K\)) shall not exceed

$$\displaystyle \begin{aligned} \sum_{k\in \mathcal{A}}C_{k}^{\mathcal{A}}=\min_{k'}\left\{ R_{k'}^{\mathcal{A}}\mid k'\in\mathcal{A}\right\}. \end{aligned} $$
(3.16)

where \(C_k^{\mathcal {A}}\) is the part of the common rate allocated to user-k \((k\in \mathcal {A})\) for the transmission of \(W_k^{\mathcal {A}}\) via \(s_{\mathcal {A}}\). Hence, the total achievable rate of user-k is

$$\displaystyle \begin{aligned} R_{k,tot}=\sum_{\mathcal{A}'\subseteq\mathcal{K},k\in \mathcal{A}'}C_k^{\mathcal{A}'}+R_k. \end{aligned} $$
(3.17)

Figure 3.6 illustrates a three-user example of the generalized RS. The message of each user is split into four parts, i.e., the message of user-1 is split into \(\{W_{1}^{123}\), \(W_{1}^{12}\), \(W_{1}^{13}\), \(W_{1}^{1}\}\). There is one 3-order stream, three 2-order streams, and three 1-order streams. The corresponding stream vectors are denoted as s 1 = [s 1, s 2, s 3]T, s 2 = [s 12, s 13, s 23]T, and s 123, respectively. Each user requires three layers of SIC to sequentially decode the intended streams. The instance of decoding order π 2 for the 2-order streams illustrated in Fig. 3.6 is 12 → 13 → 23. All users follow the rule that s 12 is decoded before s 13 and s 23 is decoded lastly. At user-1, only the 2-order streams s 12 and s 13 are decoded. The decoding order based on π 2 at user-1 is π 2,1 = 12 → 13. We have \(s_{\pi _{2,1}{(1)}}=s_{12}\) and \(s_{\pi _{2,1}{(2)}}=s_{13}\).

Fig. 3.6
figure 6

Transmission model of three-user generalized RS

2.2.4 1-Layer RS vs. 2-Layer HRS vs. Generalized RS

The inclusive relation of the above three RSMA schemes is illustrated in Fig. 3.7. The generalized RS is the most general scheme that embraces 2-layer HRS and 1-layer RS as two sub-schemes. 2-layer HRS is a sub-scheme when only the K-order stream, \(|\mathcal {K}_g|\)-order streams, and 1-order streams are active (with a nonzero power allocation) in the generalized RS, while 1-layer RS is the sub-scheme of 2-layer HRS when only the K-order stream and 1-order streams are active. All other inactive streams are allocated with zero transmit power.

Fig. 3.7
figure 7

RSMA framework and the schemes included

In terms of the computational complexity and hardware complexity at the BS and users, 1-layer RS achieves the lowest complexity, and the generalized RS has the opposite highest complexity in the RSMA framework. In the K-user case, each user in the 1-layer RS system only requires one layer of SIC without any scheduling requirement at the transmitter, while each user requires two layers of SIC and the BS requires to consider the issue of user grouping in 2-layer HRS. Both 1-layer RS and 2-layer HRS maintain relative low transceiver complexities and are practical for implementation since the number of SIC layers deployed at each user is independent from the number of users. In comparison, the generalized RS is more complex to be implemented since the number of SIC layers increases rapidly with the number of users. In the K-user case, 2K−1 − 1 layers of SIC are required at each user, and the decoding order of the common streams needs to be optimized at the transmitter. However, readers are reminded that the motivation of introducing the generalized RS contrasts with the previous two low-complex schemes, that is, to identify the best possible performance of the network at the scarifies of more SIC layers at the receivers.

2.3 RSMA vs. NOMA/SDMA/OMA

2.3.1 Framework Comparison

Table 3.3 summarizes the comparison of different multiple access techniques. Compared with the existing multiple access techniques, the major and unique characteristic of RSMA is its ability of partially decoding interference and partially treating interference as noise. Gaining benefits from its dynamic interference management capability, RSMA framework generalizes and encompasses multi-antenna NOMA (including SC–SIC and SC–SIC per group), SDMA (based on MU–LP), OMA (TDMA/FDMA), and multicasting as sub-schemes. Their relation is further illustrated in Fig. 3.8.

Fig. 3.8
figure 8

RSMA framework and the schemes included

Table 3.3 Comparison of different multiple access techniques

SDMA is a sub-scheme of RSMA when all common streams are turned off and the transmit power is fully allocated to the private streams. RSMA boils down to NOMA (based on SC–SIC) when each common stream is fully encoded by the entire message of a single user. OMA is a sub-scheme of SDMA, NOMA, and RSMA when the transmit power is fully allocated to a single user. Physical-layer multicasting is achieved by encoding the messages of all users into the K-order stream. Hence, multicasting is a special instance of RSMA with full transmit power being allocated to the K-order common stream (embracing partial messages of all users). Based on the above discussion, we obtain that SDMA based on MU–LP, multi-antenna NOMA, OMA, and multicasting are sub-schemes of RSMA. Most importantly, RSMA smoothly bridges all sub-schemes without hard switching among them.

Figure 3.9 illustrates the different mappings of the messages to the streams in the two-user case for all multiple access techniques. When K = 2, the generalized RS boils down to 1-layer RS automatically with one common stream s c containing one part of message W c,1 for user-1 and one part of message W c,2 for user-2. Other parts W p,1 and W p,2 are independently encoded into private streams s 1 and s 2. SDMA is obtained by allocating no power to the common stream (\(\left \|{\mathbf {p}}_c \right \|{ }^2=0\)) such that W k is encoded into s k directly. NOMA is obtained by encoding the message of one user, i.e., W 2 entirely into s c and W 1 into s 1 while s 2 is turned off (\(\left \|{\mathbf {p}}_2\right \|{ }^2=0\)). In this example, user-1 fully decodes the interference from the message of user-2. OMA is obtained when only one user is scheduled (\(\left \|{\mathbf {p}}_c \right \|{ }^2=\left \|{\mathbf {p}}_2 \right \|{ }^2=0\)). Multicasting is obtained when the messages of both users W 1, W 2 are combined into s c and the private streams are turned off (\(\left \|{\mathbf {p}}_1 \right \|{ }^2=\left \|{\mathbf {p}}_2 \right \|{ }^2=0\)).

Fig. 3.9
figure 9

Mapping of messages to streams

2.3.2 Complexity Comparison

The qualitative complexity of different strategies is compared in Table 3.4. SDMA based on MU–LP and OMA based on point-to-point linear precoding have the lowest receiver and encoder complexities. However, the scheduling complexity is relatively high due to the subcarrier/time-slot allocation for OMA and user selection for SDMA. As mentioned previously, SDMA based on MU–LP is only suited when the user channels are semi-orthogonal. Accurate CSIT is required to carefully design user scheduling for interference coordination.

Table 3.4 Qualitative comparison of the complexity of different multiple access techniques

Both SC–SIC per group and 2-layer HRS have the highest user grouping complexity. The total number of user grouping methods to be considered in both schemes is \(\sum _{k=1}^KS(K,k)\), where \(S(K,k)=\frac {1}{k!}\sum _{i=0}^k(-1)^i\binom {k}{i}(k-i)^K\), also known as a Stirling set number [65], is the total number of methods to partition a set of K elements into k nonempty sets. As each user in 2-layer HRS sequentially decodes the outer-group common stream and the inner-group common stream followed by the intended private stream, the decoding order is determined without introducing additional scheduling complexity and only requires two layers of SIC at each user. In comparison, at most K! decoding orders are required to be considered in SC–SIC per group for each grouping method, and each user is required to have \(|\mathcal {K}_g|-1\) layers of SIC. For example, for a four-user system with two groups and two users in each group, we have to consider three different user grouping methods and four different decoding orders for each grouping method. Generally, SC–SIC per group has the highest scheduling complexity compared with other schemes since the decoding order and user grouping are required to be jointly decided. Note that K! is the total number of decoding orders when there is one user group. In such scenario, SC–SIC per group reduces to SC–SIC. Different from the single-antenna NOMA in SISO BC where the optimal decoding order of NOMA is determined based on the channel gain, multi-antenna NOMA based on SC–SIC requires the decoding order to be jointly decided with the precoders at the transmitter. As SC–SIC is only suited for aligned user channels with certain channel strength disparities, additional scheduler complexity is introduced for a proper user scheduling algorithm. Hence, the scheduler complexity of SC–SIC is relatively high, and each user requires K − 1 layers of SIC in the K-user SC–SIC system. Compared with SC–SIC per group, SC–SIC simplifies the scheduling complexity at the transmitter (since there is no requirement of user grouping) but increases the receiver complexity.

Compared with existing multiple access techniques, RSMA is able to achieve a better trade-off between performance and complexity. All RS strategies including 1-layer RS, 2-layer HRS, and generalized RS are suited for users with any channel strength disparity and channel angle in between. Specifically, 1-layer RS has the lowest scheduling complexity compared with all other schemes since it does not have any issue of user scheduling, grouping, and ordering. It also maintains very low receiver complexity since only one layer of SIC is required at each user in the K-user scenario. 1-layer RS is a sub-scheme of 2-layer HRS and the generalized RS. Compared with 1-layer RS, the complexity at the transmitter and receivers for 2-layer HRS is higher due to a higher dimensional message splits. The receiver complexity of 2-layer HRS is still low compared with other schemes since the number of SIC layers required at 1-layer RS and 2-layer HRS is independent from the number of user K. The receiver complexity is much reduced compared with SC–SIC or SC–SIC per group or the generalized RS. Though the generalized RS achieves the highest flexibility of interference management compared with all other schemes, it has a higher transmitter and receiver complexity. The generalized RS requires the decoding order of multiple streams with the same stream order to be jointly decided with the precoders, and each user requires an exponentially increasing number of SIC layers to decode the intended streams sequentially. For example, each user requires to decode two 2-order streams in Fig. 3.6, and in total, three layers of SIC are required at each user.

The multi-antenna NOMA and the generalized RS schemes have a number of SIC layers increasing with the number of user K, which not only lead to an increase of the scheduler and receiver complexity but also impel more error propagation in SIC. They are preferred to be applied in the scenarios when K is small so as to achieve a better trade-off between the performance improvement and transmitter/receiver complexity.

2.3.3 Performance Comparison

Figure 3.10 illustrates the preferred regions for the operation of OMA, SDMA, NOMA, and RSMA with perfect CSIT. Following the evaluations in [58], we assume that the BS equipped with N t = 2 transmit antennas is serving two single-antenna users (K = 2). The channel vectors are \({\mathbf {h}}_1=1/\sqrt {2}[1, 1]^H\) and \({\mathbf {h}}_1=\gamma /\sqrt {2}[1, e^{j\theta }]^H\). As there are only two users, SC–SIC per group boils down to SC–SIC, and the generalized RS boils down to 1-layer RS. The precoders are optimized based on the weighted minimum mean square error (WMMSE) precoding optimization framework developed in [30, 42, 66] with the aim of maximizing the sum rate ∑k=1,2 R k,tot. The total achievable rate of user-k for RS is given as (3.7). The sum rate formulas of SDMA and NOMA are illustrated in [42]. The colors in Fig. 3.10 illustrate the strategy that achieves the maximized WSR as a function of \(\rho =1-\frac {|{\mathbf {h}}_1^H{\mathbf {h}}_2|{ }^2}{\left \|{\mathbf {h}}_1\right \|{ }^2\left \|{\mathbf {h}}_1\right \|{ }^2}\) (ranging from 0 to 1) and γ dB = 20log10(γ) (ranging from 0 to − 20 dB), i.e., user-1 and user-2 have a long-term SNR of 20dB and 0dB ≤ 20dB + γ dB ≤ 20dB, respectively. As the WSR of RSMA is always larger than or equal to that of other strategies, we follow the rules below to select the strategy:

  1. (i)

    if |WSRRSMA −WSROMA| < 𝜖, the preferred strategy is OMA.

  2. (ii)

    if |WSRSDMA −WSROMA| > 𝜖 and |WSRRSMA −WSRSDMA| < 𝜖, the preferred strategy is SDMA.

  3. (iii)

    if |WSRNOMA −WSRSDMA| > 𝜖 and |WSRRSMA −WSRNOMA| < 𝜖, the preferred strategy is NOMA.

  4. (iv)

    if |WSRRSMA −WSRSDMA| > 𝜖 and |WSRRSMA −WSRNOMA| > 𝜖, the preferred strategy is RSMA.

Option (iv) is selected when RSMA does not boils down to any other multiple access techniques. We observe from the figure that when equal or higher weight is allocated to the user with a stronger channel, NOMA has no benefit over SDMA at all. Only when the user fairness is taken into consideration with a higher weight allocated to the weaker user, NOMA outperforms SDMA. But NOMA is only preferred for the deployment with small ρ, i.e., users are closely aligned. SDMA is preferred whenever ρ is sufficiently large. In comparison, for all different user weights, RSMA always provides the same or better performance than SDMA, NOMA, and OMA. It unifies and outperforms existing multiple access techniques.

Fig. 3.10
figure 10

Regions of operation for different multiple access techniques, K = 2, SNR = 20 dB, 𝜖 = 0.01. (a) u 1 = 100.5, u 2 = 1. (b) u 1 = 1, u 2 = 1. (c) u 1 = 1, u 2 = 100.5

Figure 3.11 further illustrates the ergodic rate region of different multiple access techniques over 100 random channel realizations with imperfect CSIT. The BS is equipped with N t = 2 antennas and serves two single-antenna users. The channel model specified in [30, 34] is adopted, i.e., \({\mathbf {h}}_k=\widehat {\mathbf {h}}_k+\widetilde {\mathbf {h}}_k\). The estimated channel of each user \(\widehat {\mathbf {h}}_k\) and channel error \(\widetilde {\mathbf {h}}_k\) have independent and identically distributed (i.i.d.) complex Gaussian entries that follow the distributions \(\mathcal {C}\mathcal {N}(0,\sigma _k^2)\) and \(\mathcal {C}\mathcal {N}(0,\sigma _{e,k}^2)\), respectively. The variance of error \(\sigma _{e,k}^2\) scales exponentially with SNR as \(\sigma _{e,k}^2\sim \mathcal {O}(P_t^{-\alpha })\), where α ∈ [0, ) is interpreted as the quality of CSIT in the high SNR regime [5, 28,29,30, 67]. The rate region improvement of RSMA over NOMA and SDMA is significant in all subfigures. Thanks to its flexible interference management capability, RSMA is more robust to CSIT inaccuracy and channel strength disparities between the users. In contrast, NOMA is only suited when there is a certain channel strength disparity between the two users, while SDMA is suited when users have equal channel strength. Moreover, the performance of SDMA drops as CSIT becomes inaccurate.

Fig. 3.11
figure 11

Ergodic rate region comparison of different multiple access techniques with partial CSIT, K = 2, SNR = 20 dB. (a) \(\alpha =0.6, \sigma _2^2=1\). (b) \(\alpha =0.6, \sigma _2^2=0.09\). (c) \(\alpha =0.9, \sigma _2^2=1\). (d) \(\alpha =0.9, \sigma _2^2=0.09\)

We further consider the three-user case. The generalized RS does not reduce to 1-layer RS, and SC–SIC per group does not reduce to SC–SIC. Figure 3.12 illustrates the ergodic sum rate versus CSIT accuracy α of different strategies over 100 random channel realizations with imperfect CSIT. Figure 3.12(a) considers an underloaded regime, while Fig. 3.12b, c shows the results of an overloaded regime but with different channel strength disparities among users. The precoders are designed to maximize the ergodic sum rate (where users have equal weights) subject to a QoS rate constraint of each user. For α = [0.2, 0.4, 0.6, 0.8, 1], the corresponding rate constraint for user-k (k ∈{1, 2, 3}) changes as \({\mathbf {r}}_k^{th}=[0.1,0.2,0.3,0.4,0.5]\) bit/s/Hz. In all subfigures, the ergodic sum rate of SC–SIC and MU–LP drops dramatically as α decreases. In contrast, the generalized RS further boosts the system performance and achieves explicit rate gain over all other strategies especially when CSIT is severely inaccurate or in the overloaded regime.

Fig. 3.12
figure 12

Ergodic sum rate versus CSIT inaccuracy comparison of different multiple access techniques, averaged over 100 random channel realizations, K = 3, SNR = 20 dB. (a) \(\sigma _1^2=\sigma _2^2=\sigma _3^2=1, N_t=4\). (b) \(\sigma _1^2=\sigma _2^2=\sigma _3^2=1, N_t=2\). (c) \(\sigma _1^2=\sigma _2^2=1, \sigma _3^2=0.09, N_t=2\)

In an extremely overloaded scenario, we further show the WSR improvement of 1-layer RS with a much lower receiver complexity compared with SC–SIC in Fig. 3.13. The BS is equipped with two antennas and serves ten users. The rate of each user is averaged over the ten randomly generated channels. As SNR increases as [0, 5, …, 30] dB, the QoS rate constraint of each user increases as [0, 0.001, 0.004, 0.01, 0.03, 0.06, 0.1] bit/s/Hz. We observe that 1-layer RS exhibits explicit WSR improvement over all other strategies. It achieves a sum-DoF of 2 with only a single layer of SIC deployed at each user. In contrast, the slopes of the WSRs of SC–SIC and MU–LP are the same and smaller than 1-layer RS. It implies that SC–SIC and MU–LP achieve a sum-DoF of 1. However, SC–SIC requires nine layers of SIC at each user. RS is able to exploit the largest DoF in such overloaded deployment by using the common stream to pack messages from eight users while using the two private streams to serve the remaining two users. In contrast, SC–SIC and MU–LP allocate most of power to a single user, which limits their achievable DoF.

Fig. 3.13
figure 13

Weighted sum rate versus SNR comparison of different multiple access techniques for overloaded ten-user deployment with perfect CSIT. \(\sigma _1^2=1, \sigma _2^2=0.9, \ldots , \sigma _2^2=0.1\), N t = 2

2.4 Advantages of RSMA

Based on the above comparison from framework, complexity, and performance aspects, we here summarize the major advantages of RSMA:

  • Universal: RSMA is a more general multiple access framework that outperforms and unifies OMA, SDMA based on MU–LP, and multi-antenna NOMA as sub-schemes.

  • Flexible: RSMA is suited to all user deployments (with a diversity of channel directions, channel strengths) and network loads (underloaded and overloaded regimes). It implies that RSMA is capable of managing all different kinds of interference flexibly. RSMA automatically reduces to other multiple access techniques according to the channel conditions, i.e., it reduces to SDMA when user channels are orthogonal in the underloaded MISO BC with perfect CSIT. When the channels are aligned with certain channel strength disparities, it automatically boils down to NOMA. For other channel conditions, RS takes advance to the common streams and achieves a better interference management by partially decoding the interference and partially treating the remaining interference as noise.

  • Robust: RSMA is robust to CSIT inaccuracy. As RSMA is primarily motivated by multi-antenna deployments with multiuser interference coming from imperfect CSIT, it compensates the DoF loss of other multiple access techniques in imperfect CSIT and is therefore less sensitive to CSIT inaccuracy.

  • Spectrally efficient: The spectral efficiency of RSMA is always larger than or equal to that of existing multiple access techniques. Considering a MISO BC without QoS constraints, the rate region of RSMA comes much closer to the optimal DPC region than SDMA and NOMA when CSIT is perfect. When CSIT becomes imperfect CSIT, linearly precoded RSMA is able to achieve a larger rate region than complex DPC in multi-antenna BC. As RSMA achieves the optimal DoF in both perfect and imperfect CSIT, it optimally exploits the spatial dimensions and the availability of CSIT. This contrasts with SDMA and NOMA that are suboptimal.

  • Energy efficient: As RSMA is more general than SDMA and NOMA, its energy efficiency is also larger than or equal to that of existing multiple access techniques in a wide range of user deployments.

  • Enhancing QoS and fairness: RSMA exhibits a more explicit performance gain over other multiple access techniques when there is a QoS rate constraint for each user or when a higher weight is allocated to the user with a weaker channel condition. Therefore, the ability of a wireless network architecture to partially decode interference and partially treat interference as noise leads to enhanced QoS and user fairness.

  • Reducing complexity: The performance gain of RSMA can come with a lower transmitter and receiver complexity than multi-antenna NOMA. In contrast to multi-antenna NOMA that requires user grouping, ordering, and switching (between NOMA and SDMA) at the transmit scheduler and multiple layers of SIC at the receivers, 1-layer RS without any user ordering, grouping, or dynamic switching at the transmit scheduler and with only one layer of SIC at each receiver is capable of achieving significant performance gain over NOMA (as illustrated in Fig. 3.13). In contrast to SDMA that requires user pairing to pair users with semi-orthogonal channels, RSMA is suited to all channel conditions, and it does not require complex user scheduling and pairing. Moreover, RSMA is capable of further reducing CSI feedback overhead [41, 68] in the presence of quantized feedback.

3 Emerging Applications of RSMA

RSMA is originally proposed for MIMO BC in cellular communication networks. Recently, the applications of RSMA in other 5G technologies-enabled networks have attracted substantial interests. In massive MIMO system, 2-layer HRS proposed in [39] has been shown to achieve superior sum rate performance over conventional two-tier precoding schemes based on SDMA [69,70,71], and 1-layer RS has been shown to be a more robust strategy for massive MIMO in the presence of phase and amplified thermal noise since its sum rate does not saturate at high SNR [40]. The application of RSMA in the multigroup multicasting system has been shown to boost the DoF in the high SNR regime as well as to enhance the system performance in the low SNR regime [17]. By using one common stream to encapsulate parts of the multicast messages for different multicast groups, RSMA based on 1-layer RS enables the ability of partially decoding the interference and partially treating the interference as noise. Recent researches have shown that RSMA is more energy efficient in the multicell multigroup systems [43] as well as enhancing the user fairness in the multicarrier multigroup multicast systems [59]. In mmWave MIMO communication systems, the authors in [41] employ 1-layer RS and propose a one-stage feedback scheme which effectively reduces the complexity of the signaling and feedback procedure. The benefits of RSMA have been further discovered in other applications such as non-orthogonal unicast and multicast transmission (NOUM) [57], coordinated multipoint (CoMP) [49], cloud-radio access networks (C-RAN) [52, 53], simultaneous wireless information and power transfer (SWIPT) [54, 55], cooperative relaying [56, 61], wireless caching [37, 72], and unmanned aerial vehicle (UAV)-aided wireless communications [50, 51], which will not be specified here. Motivated by the benefits of RSMA discovered in cellular communications, RSMA has been applied to other communication networks such as radar communications [73] and satellite communications [48, 74], which are summarized in Fig. 3.14.

Fig. 3.14
figure 14

Emerging applications of RSMA

4 Challenges and Future Trends of RSMA

The study of RSMA is still in its infancy. Even for the applications specified in Sect. 3.3, there are still many challenges and open issues that remain to be addressed. RSMA is a goldmine of research problems for academia and standard specification issues for industry. The multifarious attractive and potential research directions of RSMA are summarized in Fig. 3.15.

Fig. 3.15
figure 15

Structure of future research directions

There are various applications of RSMA in other techniques besides those described in Sect. 3.3. Some of the techniques are complementary, and the investigation of RSMA in the combination of those techniques may collide with different sparks. For example, RSMA has shown its performance benefits respectively in cooperative relaying and NOUM. There is also a great potential of applying RSMA in NOUM with cooperative relaying. In such networks, the system performance will be further enhanced since the common stream to be forwarded from the relaying users to other users will help to enhance not only the rate of the multicast message for all users but also the unicast messages for the corresponding users.

In addition to the promising systems summarized in Fig. 3.15, there are many other combinations of RSMA that are worth to be studied, such as RSMA in UAV-aided, radar, or satellite communication systems. In the UAV-aided communications, one major challenge is the UAV deployment and trajectory optimization. However, perfectly tracking the rapidly changed channels of the entire location map is impossible which would result in strong co-channel interference. As RSMA is superior in robust interference management and it achieves higher performance gain when CSIT is imperfect, the application of RSMA in the UAV-aided multi-antenna broadcast channel has a great potential to overcome that challenge.

The key technologies required to implement RSMA are MU–MIMO/CoMP, superposition coding at the transmitter, SIC at receivers, and non-orthogonal unicast and multicast transmission. Though the standardization of RSMA has not been taken by the 3rd Generation Partnership Project (3GPP) yet, some current work items in 3GPP can be leveraged for the implementation of RSMA. MU–MIMO and CoMP are the key technologies in LTE, which are included in 3GPP Release 8 [75] and 3GPP Release 11 [76], respectively. One major receiver technique used in RSMA is SIC, which has been incorporated in 3GPP Release 12 for network-assisted interference cancellation and suppression (NAICS) [77]. In 3GPP Release 13, superposition coding at the transmitter and successive decoding at each receiver has been further considered for LTE downlink mobile broadband (MBB) services as multiuser superposition transmission (MUST) [78]. The multicast functionality is recently included in 3GPP Release 17 for 5G with the name new radio (NR) multicast/broadcast. Besides the necessary machinery discussed or approved by 3GPP, there are some implementation issues specific to RSMA which require further study. First of all, the CSI feedback mechanisms of RSMA are unclear even though RS has been shown to reduce the CSIT feedback overhead compared to MU–LP in the presence of quantized feedback [38]. Secondly, the downlink and uplink signaling of RS remains obscure. The issue of synchronizing the knowledge of how to split/merge each stream at the transmitter and receivers needs to be tackled. Last but not least, there is still a lack of link-level and system-level evaluation of RSMA. To further evaluate the recommended configurations of RSMA in the physical layer (such as frequency band, coding scheme, modulation scheme, transceiver design, topography, etc.) or higher layers (such as scheduling, error-control scheme in the multiple access layer or QoS requirements in the application layer, etc.), the link-level and system-level performance of RSMA is of significance to be investigated thoroughly.