1 Introduction

IEEE 802.11 Wireless Local Area Networks (WLANs) is becoming an indispensable part of our life, at homes and working places. Due to the problems, such as frame collisions and protocol overheads, the throughput of WLANs is significantly lower than the raw data rate of what the Physical (PHY) layer can achieve [1]. The evolution of Internet traffic is going to exacerbate this low-throughput problem. The Internet traffic has shifted from web browsings and file transfers to a wide variety of applications, many of which integrate content-rich files provided by users [2, 3]. This shift, mainly driven by the bandwidth-hungry cloud and multimedia applications, demands a performance increase in both downlink and uplink of WLANs [4].

Spatial multiplexing is one of the current trends (the spatial diversity and the frame aggregation are among others) aiming at improving the performance of wireless systems. IEEE 802.11n [5] supports spatial multiplexing in the point-to-point communication mode (i.e., Single-user MIMO or SU-MIMO). The point-to-multipoint communication mode, for example, the transmission from the Access Point (AP) to multiple stations (STAs) (i.e., downlink Multi-user MIMO or MU-MIMO), is supported by the latest IEEE amendment-802.11ac [6]. However, the uplink MU-MIMO enhancement, which is crucial to mitigate collisions and to satisfy the performance requirements in the uploading-intensive scenario, has not been supported by any IEEE standard.

In this paper, we propose a unified down/up-link MU-MIMO Medium Access Control (MAC) protocol called Uni-MUMAC, which coordinates distributed STAs to exploit the spatial multiplexing gain to improve the performance of IEEE 802.11ac WLANs. The main contributions are summarized as follows. (1) Two separate MU-MIMO MAC protocols, one for the downlink transmission [7] and the other one for the uplink transmission [8], are integrated into a unified MU-MIMO MAC protocol. Compared to [7, 8], where only one-way traffic is considered (i.e., the downlink or the uplink), the presence of both downlink and uplink transmissions has been taken into account. (2) A special focus is placed at finding the most suitable value of the 2-nd round Contention Window \((CW_{\rm 2nd})\) to obtain the highest system throughput, and the impact of the optimized uplink transmission on the downlink is discussed. With the optimized \(CW_{\rm 2nd}\) and other properly configured parameters (e.g., the number of aggregated frames and the queue length of the AP), Uni-MUMAC is then extensively evaluated through simulations in the downlink-dominant and the down/up-link balanced traffic scenarios. (3) An analytic model is developed to validate the simulation results, and a prominent proposal in the literature is implemented to compare with our scheme.

The rest of the paper is organized as follows. First, Sect. 2 explores some of the key MU-MIMO MAC proposals in the literature. Then, Sect. 3 introduces the modified frame structure and detailed Uni-MUMAC operating procedures. After that, Sect. 4 gives the considered scenarios to evaluate Uni-MUMAC, the saturation throughput model, simulation results and observations. Finally, Sect. 5 concludes the paper and discusses the future research challenges.

2 Related work

Most previous work has put efforts on adjusting MAC parameters or extending MAC functions to improve the performance of WLANs. In the downlink, the spatial multiplexing technique has recently gained much attention. To support it, many proposals in the literature adopt the following MAC procedure. The AP firstly sends out a modified Request to Send (RTS) containing a group of targeted STAs, then those listed STAs estimate the channel, add the estimated Channel State Information (CSI) into the extended Clear to Send (CTS) and send it back. As soon as the AP receives all successful CTSs, it precodes the outgoing signals and sends multiple data frames simultaneously.

Cai et al. [9] propose a distributed MU-MIMO MAC protocol that modifies RTS and CTS frames to estimate the channel, based on which, the AP is able to concurrently transmit frames to multiple STAs. Kartsakli et al. [10] consider an infrastructured WLAN and propose four multi-user scheduling schemes to simultaneously transmit frames to STAs. The results show that the proposal achieves notable gains compared to that of the single user case. Gong et al. [11] propose a modified Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) protocol with three different ACK-replying mechanisms. The authors claim that the proposed protocol can provide a considerable performance improvement against the beamforming based approach when Signal-to-noise Ratio (SNR) is high. Zhu et al. [12] investigate the required MAC modifications to support downlink MU-MIMO transmissions by focusing on the fairness issue. The proposed Transmit Opportunity (TXOP) sharing scheme not only obtains a higher throughput but is also more fair than the conventional mechanism. Cha et al. [13] compare the performance of a downlink MU-MIMO scheme with a Space Time Block Coding (STBC) based frame aggregation scheme. The results show that the former produces a higher throughput than the latter if transmitted frames are of similar length.

The uplink enhancement is getting more attention as the popularity of Peer-to-Peer (P2P) and cloud applications increases. In general, there are two broad categories of uplink MU-MIMO MAC enhancements, namely, the un-coordinated access and the coordinated access. The former utilizes the MAC random mechanism to decide which STAs are allowed for data transmissions, while the latter employs the AP to schedule STAs’ uplink access.

Some of the un-coordinated uplink access schemes are sampled as follows. Jin et al. [14] evaluate the performance of uplink MU-MIMO transmissions in the IEEE 802.11 basic access mode, where the simultaneous uplink transmissions are on the random access basis and the channel coefficients of each STA are assumed to be known by the AP. Zheng et al. [15] present a Distributed Coordination Function (DCF) enhancement called Two-Round RTS Contention (TRRC) to take advantage of the spatial domain. The proposed scheme allows STAs to contend for the channel after a successful RTS is detected. Tan et al. [16] present a distributed MAC scheme called Carrier Counting Multiple Access (CCMA), where a beacon that contains the uplink access threshold is announced by the AP periodically. Based on the threshold, STAs count the number of ongoing transmissions by monitoring preambles, and then decide to contend for the channel or stay idle. Babich et al. [17] investigate the theoretical model of asynchronous frame transmissions, where a STA is allowed to transmit even if other STAs are already transmitting.

Some of the coordinated uplink access schemes are overviewed as follows. Tandai et al. [18] propose a synchronized uplink transmission scheme coordinated by the AP. On receiving requests from STAs, the AP broadcasts a pilot-Requesting CTS (pR-CTS) to schedule STAs’ pilot transmissions for estimating the channel. After obtaining the CSI, the AP sends a Notifying-CTS (N-CTS) to inform the selected STAs to transmit frames in parallel. Zhou et al. [19] propose a two-round channel contention mechanism, which divides the MAC procedure into two parts, namely, the random access and the data transmission. The random access terminates when the AP receives a predefined number of successful RTSs, and then the data transmission follows. Zhang [20] further extends the two contention rounds to multiple rounds, which enable more STAs to be involved in parallel uplink transmissions. The proposed protocol can fall back to the single-round mode automatically on condition that the traffic is low and the single-round scheme can provide higher throughput. Jung et al. [21] present an asynchronous uplink Multi-Packet Reception (MPR) scheme, where an additional feedback channel is assumed to be employed by the AP to acknowledge the successful frame receptions along with other ongoing transmissions.

Only a few work has combined the downlink and the uplink transmissions together. In [22], Shen et al. propose a High Throughput MIMO (HT-MIMO) MAC protocol, which utilizes frequency signatures to differentiate simultaneously-received control messages. The proposal works in the Point Coordination Function (PCF) mode, hence both downlink and uplink transmissions can be only initiated by the AP. In [23], Jin et al. focus on the unbalanced throughput problem between downlink and uplink, where a Contention Window (CW) adjustment scheme and a random piggyback scheme are proposed to increase the downlink throughput ratio. In [24], Li et al. propose a multi-user transmission MAC scheme, which supports the Multi-Packet Transmission (MPT) in the downlink and multiple control frame receptions (e.g., CTSs or ACKs) in the uplink, while simultaneous data transmissions from multiple STAs are not considered. Due to the simplicity, the MAC scheme of [24] is implemented to compare with our proposal.

3 Uni-MUMAC operations

Uni-MUMAC is based on the IEEE 802.11 Enhanced Distributed Channel Access (EDCA), which relies on the CSMA/CA mechanism to share the wireless channel. EDCA can operate in either the basic access mode or the optional RTS/CTS handshaking one. In this paper, Uni-MUMAC adopts and extends the RTS/CTS scheme for the following reasons: (1) The AP can notify the uplink contending STAs about the number of available antennas by a modified control frame; (2) The AP can estimate the CSI from the RTS/CTS exchanging process; (3) The distributed STAs can be synchronized from the exchanging process to transmit to the AP in parallel.

3.1 Frame structure

3.1.1 PHY frame structure

The PHY frame structure of IEEE 802.11ac is shown in Fig. 1, where VHT PLCP, PPDU and MPDU stand for Very High Throughput Physical Layer Convergence Protocol, PLCP Protocol Data Unit and MAC Protocol Data Unit, respectively. As shown from the frame structure, PPDU consists of the PHY preamble and MPDUs. IEEE 802.11ac specifies that all MPDUs must be transmitted in the format of Aggregated-MPDU (A-MPDU), where aggregated MPDUs are separated by MPDU delimiters. Before being delivered to the PHY layer, a service field and a tail field are appended to the A-MPDU. The PHY preamble is formed by 3 legacy fields for the backward compatibility (i.e., L-STF, L-LTF and L-SIG) and some newly introduced VHT fields [6][25].

Fig. 1
figure 1

PHY frame format of IEEE 802.11ac

IEEE 802.11ac introduces these VHT fields to assist WLANs in obtaining high performance. A Group Identifier (Group-ID) field is added in VHT Signal Field-A (VHT-SIG-A), which is used to inform the targeted STAs about the followed MU-MIMO transmission, the order and the position of each STA’s corresponding stream. A complete Group-ID table is created and disseminated by the AP, and will be recomputed as STAs associate or de-associate to the AP. Since the number of STAs’ combinations can exceed the available number of Group-ID in a large basic service set, and the down/up-link channel may be different, thus, we assume a single Group-ID can reference to multiple transmission sets along with other PHY preamble features that could be used to resolve the intended STAs [26]. In other words, there will be always at least one proper Group-ID entry that can be mapped to the intended transmission set.

VHT Long Training Field (VHT-LTF) can contain an orthogonal training sequence that is known by both the transmitter and the receiver to estimate the MIMO channel. The number of VHT-LTF fields should not be less than the number of transmitted spatial streams to precisely estimate the channel. The legacy and VHT-SIG-A fields adopt the low rate modulation scheme to make the preamble understandable to all STAs, while the rest VHT fields and A-MPDU are transmitted using the VHT modulation scheme. In this paper, a single modulation and coding scheme (MCS), i.e., 16-QAM with 1/2, is used for all frames’ transmissions to simplify the simulation, although the extension to various MCS for different frames and STAs is straightforward. Here, we only introduce the PHY features that are closely related to the proposed protocol. The readers can refer to [6] for details of other PHY features.

3.1.2 MAC frame structure

The control frames of Uni-MUMAC are shown in Figs. 2 and 3. In the downlink, the control frames are MU-RTS, MU-CTS and MU-ACK. MU-RTS keeps the standard RTS frame structure, because the AP can utilize the Group-ID field of the PHY frame to notify targeted receivers. MU-CTS and MU-ACK add a transmitter address field to the original CTS and ACK frames to facilitate the AP to differentiate multiple responding STAs. Note that MU-CTS and MU-ACK coincidentally have the same frame structure as the standard RTS frame after adding a transmitter address field to the original CTS and ACK frames.

Fig. 2
figure 2

Frame structure of standard RTS

Fig. 3
figure 3

Modified frames for uplink transmissions. a Ant-CTS. b G-CTS & G-ACK

In the uplink, all frame modifications are limited to the AP side to minimize STAs’ overhead. These modified frames are Ant-CTS (CTS with antenna information), G-CTS (Group CTS) and G-ACK (Group ACK), as shown in Fig. 3. An antenna information field is added to Ant-CTS, which is broadcast by the AP to announce the number of available antennas (after one antenna is occupied in the first contention round) and the start of the 2-nd contention round. G-CTS and G-ACK have the identical frame structure, where the receiver address field is removed and replaced by the Group-ID field in the IEEE 802.11ac PHY frame, while a transmitter address field is added to indicate the AP address. The G-CTS frame is used to inform STAs the start of the data transmission, and G-ACK is used to indicate the successful reception of data.

3.2 Successful downlink transmissions

Figure 4 shows a successful Uni-MUMAC downlink transmission. Initially, the channel is assumed busy (B). After the channel has been idle for an Arbitration Inter Frame Space (AIFS), a random backoff (BO) drawn from CW starts to count down and will be frozen as soon as the channel is detected as busy.

Fig. 4
figure 4

A successful Uni-MUMAC downlink transmission

Suppose the AP first wins the channel contention and sends a MU-RTS. Then, the STAs who are included in Group-ID reply with MU-CTSs sequentially as the indicated order. Those STAs who are not included in the MU-RTS will set the Network Allocation Vector (NAV) to defer their transmissions. After a MU-CTS is received, the AP measures the channel through the training sequence included in the PHY preamble, and then uses the estimated CSI to precode the simultaneously-transmitted frames. As being precoded, the frames destined to different STAs will not interfere with each other. Finally, STAs send MU-ACKs simultaneously to acknowledge the successful reception of data frames.

Note that, the uplink channel is assumed to be the same as the downlink one in this paper. In other words, the implicit CSI feedback, namely, the AP estimates the channel using the training sequence included in the MU-CTS, is adopted. The reason is that the explicit CSI feedback will need more computing capability at STAs and require an extra field in the MU-CTS to include the measured CSI, which may not be suitable for STAs in some capacity or power constraint scenarios.

3.3 Successful uplink transmissions

In the uplink, a standard RTS is sent to the AP by the STA that won the 1-st round channel contention. Instead of replying a CTS, an Ant-CTS is broadcast by the AP with two functions: (1) to notify the STA about the successful reception of the RTS, and (2) to inform other STAs that the number of available antennas and the start of the 2-nd contention round. The STAs who have frames to send will compete for the available spatial streams in the 2-nd contention round. A new random \(BO \, (BO_{\rm 2nd})\) drawn from \([0,CW_{\rm 2nd}-1]\) starts to count down, and a RTS will be sent if \(BO_{\rm 2nd}\) of a STA reaches 0. The number of available antennas of the AP decreases by one each time an uplink RTS is successfully received. The 2-nd contention round finishes as: (1) all available antennas of the AP are occupied or (2) a predefined duration of the 2-nd contention round elapses in case there are not enough contending STAs (the maximum duration of the 2-nd contention round is set to \(CW_{\rm 2nd}\) slots). As soon as the 2-nd contention round finishes, a G-CTS is sent by the AP to indicate the readiness for receiving multiple frames in parallel. The G-CTS frame includes the STAs who have successfully sent RTSs during both 1-st and 2-nd contention rounds. When the G-CTS is received by the targeted STAs, they are synchronized to send data frames to the AP. Finally, the AP acknowledges the received data frames with G-ACK.

An example of a successful uplink transmission is shown in Fig. 5, in which illustrating case, the AP has 3 antennas, STA 2 picks \(BO_{\rm 2nd}=0\) and STA 3 picks \(BO_{\rm 2nd}=1\) from \([0,CW_{\rm 2nd}-1]\), respectively.

Fig. 5
figure 5

A successful Uni-MUMAC uplink transmission

It is important to point out that the RTSs sent by STAs in the 2-nd contention round could collide with G-CTS sent by the AP. For example, in the case that the RTS sent by a STA who claims the AP’s last available antenna is not heard by some STAs (hidden terminals), which therefore assume that the AP still has available antennas. Then, after a Short Inter Frame Space (SIFS) interval, the G-CTS sent by the AP and RTSs sent by the hidden STAs would collide. To avoid this unexpected scenario, STAs are forced to wait for a Multi-User SIFS interval in the 2-nd contention round. MU-SIFS is an interval longer than SIFS but shorter than AIFS, which not only prioritizes the AP to send the G-CTS, but also avoids STAs to misunderstand MU-SIFS as an idle channel.

3.4 Frame collisions

Collisions will occur in both 1-st and 2-nd contention rounds if more than one STA choose the same random backoff value. On sending a RTS, EDCA specifies that the STA has to set a timer according to Eq. (1) to receive the expected CTS, where \(T_{\rm CTS}\) represents the transmission duration of a CTS frame. If CTS is not received before the timer expires, the STAs who previously sent RTSs assume that collisions occurred. These RTS-sending STAs will compete for the channel after the expiration of the timer. For the RTS-receiving STAs, none of RTSs can be decoded correctly. Therefore, after the collision time, the receiving STAs will wait for an Extended Inter Frame Space [EIFS, as shown in Eq. (2)] to compete for the channel together with those RTS-sending STAs.

As shown in Fig. 6 (Ant-CTS and MU-CTSs with dotted lines mean these frames would be transmitted if there were no collisions), collisions in the 1-st contention round include two cases: (1) collisions among STAs; (2) collisions between STAs and the AP. Since STAs can not differentiate these two cases, the collision time has to be set according to the duration of the longer frame, which is \(T_{\rm MU-RTS}\). In addition, the \(\text {CTS}_{\rm timer}\) and the EIFS interval also have to be extended according to \(\text {MU-CTS}_{\rm timer}\) [as shown in Eq. (3), where \(N\) is the number of AP’s antennas] and Multi-User EIFS [MU-EIFS, as shown in Eq. (4)], to take the scenario that the AP is involved in collisions into account.

$$\text {CTS}_{{\rm timer}} = \text {SIFS} + T_{{\rm CTS}}$$
(1)
$$\text {EIFS}= \text {SIFS} + T_{{\rm CTS}} + \text {AIFS}$$
(2)
$$\text {MU-CTS}_{{\rm timer}} = N \cdot ( \text {SIFS} + T_{{\rm MU-CTS}})$$
(3)
$$\text {MU-EIFS} = N \cdot ( \text {SIFS} + T_{{\rm MU-CTS}}) + \text {AIFS}$$
(4)
Fig. 6
figure 6

Collisions in the \(1\)-st contention round

If collisions occur in the 2-nd contention round, the colliding STAs will not be indicated as the receivers in G-CTS. Therefore, only the STAs that have successfully sent RTSs in both contention rounds are allowed to transmit frames to the AP at the same time (as illustrated in Fig. 7).

Fig. 7
figure 7

RTS collisions in the 2-nd contention round

3.5 Other considerations

In IEEE 802.11 EDCA, a STA renews its \(BO\) if the channel contention was successful. For the STAs who did not win the contention, the frozen \(BO\) is used for the next contention round. In this paper, \(BO\) of the 1-st contention round is renewed after collisions in the 1-st round or if the STA is the initiator of the two-round process. Although both STA 1 and STA 2 participate in the transmission as shown in Fig. 7, STA 1 is considered to be the initiator. In other words, STA 1 will have a new random \(BO\) in the followed 1-st contention round, while STA 2 will use the frozen \(BO\).

It is more straightforward regarding the \(BO_{\rm 2nd}\) renewal policy. Each STA draws a fresh \(BO_{\rm 2nd}\) from \(CW_{\rm 2nd}\) as soon as a new 2-nd contention round starts.

G-CTS is sent out by the AP when the number of available antennas reaches zero or the duration of the 2-nd contention round drains. As soon as the Ant-CTS is sent, the AP sets the G-CTS timer to account for up to \(CW_{\rm 2nd}\) slots [as shown in Eq. (5)].

$$\text {G-CTS}_{{\rm timer}} = CW_{{\rm 2nd}} \cdot (\text {MU-SIFS} + T_{{\rm RTS}})$$
(5)

4 Performance evaluation

Uni-MUMAC is evaluated using an analytic model and simulations. The analytic model is adapted from Bianchi’s saturation throughput model [27] to support MU-MIMO transmissions in both downlink and uplink. The simulation is implemented in C++ using the Component Oriented Simulation Toolkit (COST) library [28] and the SENSE simulator [29].

A single-hop WLAN implementing Uni-MUMAC is considered as shown in Fig. 8. It consists of one AP and \(M\) STAs with an error-free channel. The AP employs an array of \(N\) antennas, while each STA has only one antenna. The data frame has a fixed length of \(L\) bits. The parameters used to evaluate Uni-MUMAC are listed in Table 1.

Fig. 8
figure 8

Down/up-link Uni-MUMAC transmissions

Table 1 System parameters

4.1 Saturation throughput analysis

Let \(\tau = \frac{2}{(CW+1)}\) be the transmission probability of a node in a random slot, where \(CW\) is the size of the 1-st round contention window. Then, the probability that the channel is idle is:

$$p_{\rm i}=(1-\tau )^{M+1}.$$
(6)

The probability that the channel sees a successful transmission slot, \(p_{\rm s}\), is given by:

$$p_{\rm s} =\left( {\begin{array}{c}M+1\\ 1\end{array}}\right) \tau (1-\tau )^{M} = (M+1) \tau (1-\tau )^{M},$$
(7)

which accounts for that a single node (either the AP or a STA) successfully wins the 1-st round channel contention.

By deducting \(p_{\rm i}\) and \(p_{\rm s}\), the probability that the channel observes a collision slot, \(p_{\rm c}\), is obtained:

$$p_{{\rm c}}=1-p_{{\rm i}}-p_{{\rm s}}.$$
(8)

In the saturated condition, a successful downlink transmission always contains \(N\) (the number of AP antennas) data streams. Therefore, the number of bits of a successful downlink transmission \((N_{\rm b,down})\) is:

$$N_{\rm b,down}= \alpha \cdot N \cdot N_{\rm f} \cdot L \cdot p_{\rm s},$$
(9)

where \(\alpha =\frac{1}{M+1}\) is the probability that a transmission is from the AP, and \(N_{\rm f}\) is the number of aggregated frames in an A-MPDU.

The calculation of the successfully received number of bits of uplink \((N_{\rm b,up})\) has to account for successful transmissions of both 1-st and 2-nd contention rounds:

$$N_{\rm b,up}= (1-\alpha ) \cdot N_{\rm f} \cdot L \cdot p_{\rm s} \cdot \sum _{\rm x=1}^{N} p_{\rm x\_ant} \cdot \text {x},$$
(10)

where \(p_{\rm x\_ant}\) is the probability that \(\hbox {x} (\hbox {x} \in [1, N])\) antennas of the AP have been used for the uplink transmission. In other words, one antenna has been obtained by a STA in the 1-st contention round, and x-1 antennas have been successfully obtained by STAs in the 2-nd contention round.

The duration of a successful downlink transmission, \(T_{\rm s,down}\), is:

$$T_{\rm s,down} = \text {AIFS} +T_{\rm MU-RTS} + N\cdot (T_{\rm MU-CTS} + \text {SIFS}) + T_{\rm A-MPDU} + T_{\rm MU-ACK} + 2 \cdot \text {SIFS}.$$
(11)

An example to calculate the duration of a MU-RTS frame and a data frame using the system parameters of Table 1 is given in Eq. (12). \(T_{\rm PHY}(N)=36+N\cdot 4\) \(\mu\)s are the duration of PHY header (the number of the VHT-LTF fields is proportional to the number of AP antennas \(N\)); \(L_{\rm service}\), \(L_{\rm tail}\) and \(L_{\rm delimiter}\) are the length of the service field, the tail field and the MPDU delimiter; \(L_{\rm DBPS}\) and \(T_{\rm symbol}\) are the number of data bits in a symbol and the symbol duration; \(N_{\rm f}\) is the number of aggregated frames in an A-MPDU; \(L_{\rm MU-RTS}\) and \(L_{\rm MAC}\) are the length of MU-RTS and the MAC header, respectively. More detailed calculation of the frame duration can be found in [30].

$$\left\{ \begin{array}{l} T_{{\rm MU-RTS}}=T_{{\rm PHY}}(N)+\Big \lceil \frac{L_{{\rm service}}+L_{{\rm MU-RTS}}+L_{{\rm tail}}}{L_{{\rm DBPS}}}\Big \rceil T_{{\rm symbol}} \\ T_{{\rm A-MPDU}}=T_{{\rm PHY}}(N)+\Big \lceil \frac{L_{{\rm service}}+ N_{\rm f}\cdot (L_{{\rm MAC}}+ L + L_{{\rm delimiter}}) + L_{{\rm tail}}}{L_{{\rm DBPS}}}\Big \rceil T_{{\rm symbol}} \end{array}\right.$$
(12)

The duration of a successful uplink transmission, \(T_{\rm s,up}\), is:

$$T_{\rm s,up} = \text {AIFS} +T_{\rm RTS} + T_{\rm Ant-CTS} + E_{\rm 2nd-slots}+ T_{\rm G-CTS} + T_{\rm A-MPDU} + T_{\rm G-ACK} + 4 \cdot \text {SIFS},$$
(13)

where \(E_{\rm 2nd-slots}\) stands for the average duration of the 2-nd contention round.

$$E_{\rm 2nd-slots}=(T_{\rm RTS} + \text {MU-SIFS})\cdot \sum _{\rm k=1}^{CW_{\rm 2nd}} p_{\rm k\_Slot} \cdot \text {k},$$
(14)

where \(p_{\rm k\_Slot}\) is the probability that there are \(\hbox {k} (\hbox {k} \in [1, CW_{\rm 2nd}])\) slots in the 2-nd contention round.

As a STA can not differentiate if collisions of the 1-st round are caused by the AP or other STAs, the collision time has to be set according to the duration of the longer frame:

$$T_{\rm c} = \text {AIFS} +T_{\rm MU-RTS} + N\cdot (T_{\rm MU-CTS} + \text {SIFS}).$$
(15)

The average duration of a channel slot is:

$$T_{\rm average} = \alpha \cdot p_{\rm s} \cdot T_{\rm s,down} + (1-\alpha )\cdot p_{\rm s} \cdot T_{\rm s,up} +p_{\rm c}\cdot T_{\rm c} + p_{\rm i}\cdot \sigma.$$
(16)

Equation (17) gives a simple example to calculate \(p_{2\_{\rm ant}}\), in which case, the AP has 2 antennas and \(CW_{\rm 2nd} =2\):

$$p_{2\_{{\rm ant}}}= \left( {\begin{array}{c}M-1\\ 1\end{array}}\right) \frac{1}{CW_{{\rm 2nd}}}\left( 1-\frac{1}{CW_{{\rm 2nd}}}\right) ^{M-2} + \left( {\begin{array}{c}M-1\\ 1\end{array}}\right) \frac{1}{CW_{{\rm 2nd}}}\left( 1-\frac{1}{CW_{{\rm 2nd}}}\right) ^{M-2}\cdot p_{1\_{{\rm fail}}}.$$
(17)

The first part of Eq. (17) stands for that only one STA is successful in the 1-st slot. The second part represents that only one STA is successful in the 2-nd slot, which is conditioned on that the 1-st slot fails (\(p_{\rm 1\_fail}\), no STAs or more than one STA chooses the 1-st slot). Note that the similar condition is not required for the first part, because the 2-nd round contention finishes as soon as a STA wins the 1-st slot regardless the choices of other STAs of other slots. As \(CW_{\rm 2nd}\) increases, the closed form of \(p_{\rm 2\_ant}\) becomes infeasible due to various combination of conditions for a STA to succeed in different slots. Therefore, we use the Monte Carlo method to calculate \(p_{\rm x\_ant}\) and \(p_{\rm k\_Slot}\), the pseudo code of which is shown in Algorithm 1.

figure f

Finally, the collision probability of a node,

$$P_{\rm collision} = 1-(1-\tau )^M,$$
(18)

and down/up-link throughput are derived:

$$\left\{ \begin{array}{l} S_{{\rm down}} = \frac{N_{{\rm b,down}}}{T_{{\rm average}} } \\ S_{\rm up} = \frac{N_{{\rm b,up}}}{T_{{\rm average}}}.\end{array}\right.$$
(19)

The transmission probability \(\tau\), Eqs. (18) and (19) form a non-linear system, which can be resolved by an iterative numerical technique [31].

4.2 System performance against \(CW_{\rm 2nd}\)

In this sub-section, the performance of Uni-MUMAC is evaluated by increasing \(CW_{\rm 2nd}\), with the goal to find a \(CW_{\rm 2nd}\) value that maximizes the system performance. Two traffic conditions are considered: (1) the saturated one, as shown in Fig. 9, and (2) the non-saturated one, as shown in Fig. 10. The saturated condition means that both the AP and STAs always have frames to transmit. Obviously, there is no 2-nd round channel access when the AP has 1 antenna, which is why the results keep constant as \(N=1\). Note that the plots include both analysis and simulation results in the saturated condition, while the plots include only simulation results of the non-saturated condition.

Fig. 9
figure 9

Saturated throughput against \(CW_{\rm 2nd}\)

Fig. 10
figure 10

Non-saturated throughput & Average delay

As shown in Fig. 9, when the WLAN is saturated (i.e., both downlink and uplink are saturated), \(CW_{\rm 2nd}\) has very small impact on the downlink throughput (AP’s throughput). However, for the uplink, the importance of choosing an appropriate \(CW_{\rm 2nd}\) is observed. For example, the uplink throughput (STAs’ throughput) approaches its maximum when \(CW_{\rm 2nd} \in [8,12]\) as \(M=8\) [Fig. 9(a)] and when \(CW_{\rm 2nd} \in [12,16]\) as \(M=15\) [Fig. 9(b)].

In the non-saturated condition, we set the traffic load for each STA and the AP to 1.4 and 11.2 Mbps, respectively. In Fig. 10(a), the downlink throughput (\(N=2\) and 4) obtains the highest value when \(CW_{\rm 2nd} \in [4,8]\), and then decreases as \(CW_{\rm 2nd}\) keeps increasing. The reason is that the continuous increase of \(CW_{\rm 2nd}\) leads to longer uplink transmissions that harm the downlink ones. Figure 10(b) shows that the average delay increases as \(CW_{\rm 2nd}\) increases. Note that, the average delay remains at a relatively low level when the system is in the non-saturated condition, for example, the average delay of STAs when \(CW_{\rm 2nd} \in [4,34]\) and the average delay of the AP when \(N=4\) and \(CW_{\rm 2nd} \in [4,8]\). However, the average delay of the AP \((N=4)\) increases sharply as the downlink traffic approaches saturation.

It is also observed that the downlink throughput, as the network becomes saturated, is much lower than the uplink one. The reasons are as follows. First, the AP bottle-neck effect. It is due to the fact that the AP manages all traffic to and from STAs in a WLAN, while it has the same probability to access the channel as the STAs due to the random backoff mechanism of CSMA/CA. In addition, the inherently high traffic load at the AP results in that the downlink is saturated in most of the time. Thirdly, a favorable value of \(CW_{\rm 2nd}\) for the uplink does not mean the same benefit to the downlink. For example, as shown in the Fig. 9, the uplink obtains the highest throughput when \(CW_{\rm 2nd}\) is set approximately to \(M \, (CW_{\rm 2nd} \approx M)\), while the downlink transmission prefers a value of \(CW_{\rm 2nd}\) as small as possible.

In order to mitigate the AP bottle-neck effect and compensate the downlink disadvantage when STAs choose a big \(CW_{\rm 2nd}\), we set the maximum number of frames that the AP can aggregate in an A-MPDU to \(M(N_{\rm f} \le M)\), while keeping the number of frames aggregated by each STA to 1 in the following simulations. Also, the queue length of the AP is set to quadratically increase with the number of STAs \((Q_{\rm ap}=M^2)\) to statistically guarantee that there are enough frames destined to different STAs [30].

In Figs. 11 and 12, the performance of Uni-MUMAC is evaluated in the same condition as done in Figs. 9 and 10 except that the network adopts the new frame aggregation scheme (AP’s \(N_{\rm f} \le M\), STA’s \(N_{\rm f} =1\)) and the new queue length \((Q_{\rm ap}=M^2, Q_{\rm sta}=50)\). The results show that Uni-MUMAC manages to avoid the extremely low downlink throughput when the system is saturated (Fig. 11) and keeps the downlink transmission always in the non-saturation area [Fig. 12(a)], which is not achieved in Fig. 10(a). The average delay of the AP [Fig. 12(b)] is much lower compared to that of the AP in Fig. 10(b), which is because the system remains in the non-saturated condition by employing the frame aggregation scheme.

Fig. 11
figure 11

Saturated throughput when AP aggregates frames

Fig. 12
figure 12

Non-saturated throughput & Average delay when AP aggregates frames

The results from Fig. 11 also show that the system can roughly obtain the maximum performance when \(CW_{\rm 2nd} \in [M-4,M+4]\). For example, in the case that the AP has 4 antennas, the system throughput (AP+STA) reaches its maximum when \(CW_{\rm 2nd} \in [6,8]\) as \(M=8\) and \(CW_{\rm 2nd} \in [12,16]\) as \(M=15\), respectively. Therefore, the optimum value of \(CW_{\rm 2nd}\) is fixed to \(M\) in the following simulations.

4.3 System performance against \(M\)

In this sub-section, the performance of Uni-MUMAC is evaluated against the number of STAs in the downlink-dominant and the down/up-link balanced traffic scenarios, where \(M\) is increased from 1 to 35, the maximum number of frames aggregated at the AP is set to \(M\) and the 2-nd round Contention Window is also set to \(M\). The two traffic scenarios are specified as follows.

  1. 1.

    Downlink-dominant: This is the traditional WLAN traffic scenario, where the AP manages a much heavier traffic load compared to that of STAs. Therefore, the traffic load of the AP is set to be 4 times higher than that of each STA. For instance, if the traffic load of a STA is 0.8 Mbps and there are 5 STAs, the traffic load of the AP will be \(4 \cdot 0.8 \cdot 5=16\) Mbps.

  2. 2.

    Down/up-link balanced: This is one of WLAN traffic types that not only includes P2P applications, which have already been around for some years, but also includes those emerging content-rich file sharing and video calling applications. Therefore, the traffic load of the AP is set to be the same as that of each STA. In this case, if there are 5 STAs, and each STA has 0.8 Mbps traffic load, the traffic load of the AP will be \(0.8 \cdot 5=4\) Mbps.

The multi-user MAC scheme (LI-MAC) proposed by Li et al. [24] is implemented and used as a reference (named as AP/STA-LI in the legend) to compare with Uni-MUMAC. For fair comparison, LI-MAC and Uni-MUMAC adopt the same configuration parameters (as shown in Table 1). The key features of LI-MAC and Uni-MUMAC are illustrated in Table 2.

Table 2 Key features of LI-MAC and Uni-MUMAC

Figure 13(a) shows the throughput by increasing the number of STAs in the downlink-dominant traffic scenario. It is with clear advantage to employ a higher number of antennas at the AP. The downlink throughput is much higher than the uplink one before the system gets saturated. The reasons for that are twofold: (1) the AP traffic load is inherently higher than that of STAs, and (2) the AP adopts the frame aggregation scheme. As the system becomes saturated, the throughput of both downlink and uplink decreases as \(M\) increases.

Fig. 13
figure 13

Throughput against \(M\)

As shown in Fig. 13(a), the uplink throughput of LI-MAC \((N=4)\) is the same as that of Uni-MUMAC \((N=1)\), which is because LI-MAC adopts the baseline DCF in the uplink. As the uplink throughput approaches saturation \((M=15)\), the downlink throughput of LI-MAC starts to decrease. The downlink throughput of Uni-MUMAC can achieve higher gains when the network is not saturated, which is because the proposed 2-nd round transmission increases the uplink transmission efficiency, and therefore decreases the number of AP’s channel contenders. However, as the number of STAs further increases, where both up/down-link saturate, LI-MAC outperforms Uni-MUMAC, which is because Uni-MUMAC suffers a high collision rate in the 2-nd round that prolongs the 2-nd round duration. However, it is important to point out that neither LI-MAC or Uni-MUMAC is able to work sustainably in the saturated condition.

Figure 13(b) shows the throughput against \(M\) in the down/up-link balanced traffic scenario. As expected, Uni-MUMAC achieves the balanced downlink and uplink throughput. This is because the AP and STAs are set to have the same traffic load, and more importantly, the frame aggregation scheme (AP’s \(N_{\rm f} \le M\), STA’s \(N_{\rm f} =1\)) counteracts the STAs’ collective advantage on the channel access.

Comparing with Uni-MUMAC, the downlink throughput of LI-MAC achieves better performance when the uplink is saturated, which is because the duration of collisions in the uplink of LI-MAC is much shorter than that of Uni-MUMAC. However, the drawback is that LI-MAC has a big throughput gap between the AP and STAs, which does not satisfy the traffic requirements of the considered scenario.

Figure 14 shows the average delay against \(M\). Both downlink and uplink delays increase with \(M\), and grow significantly as the downlink or the uplink traffic approaches the saturation. After the system gets saturated, the average delay becomes steady. It is worth pointing out that the average delay of STAs is higher than that of the AP when \(M\) becomes bigger. The reason is that the transmission duration of the AP gets longer as \(M\) increases (due to the frame aggregation scheme), which makes STAs waiting longer to access the channel.

Fig. 14
figure 14

Average delay against \(M\)

Figure 15 shows the 1-st round collision probability increases with \(M\) and converges when the system becomes saturated, which confirms the down/up-link saturation trend as discussed in Figs. 13 and 14. It is interesting to note that the collision probability of STAs is higher than that of the AP when the system is non-saturated. The reason for that is a STA transmits less frequently than the AP in the non-saturated condition, which results in a lower conditional collision probability for the AP. It can be clearly explained by Eq. 20, where \(p_{\rm ap}\) and \(\tau _{\rm ap}\) (\(p_{\rm sta}\) and \(\tau _{\rm sta}\)) are the 1-st round collision probability and the transmission probability of the AP (or a STA) in the non-saturated condition:

$$\left\{ \begin{array}{l} p_{\rm ap}=1-(1-\tau _{\rm sta})^{M} \\ p_{\rm sta}=1-(1-\tau _{\rm sta})^{M-1}\cdot (1-\tau _{\rm ap}). \end{array}\right.$$
(20)
Fig. 15
figure 15

1-st round collision probability against \(M\)

Figure 16 shows the 2-nd round collision probability against M. It is clear that the 2-nd round collision probability is higher when the system traffic load is higher. In the low number of STAs area, the 2-nd round collision probability when the AP has 2 antennas is sometimes lower than that when the AP has 4 antennas. The reason is that, a higher number of antennas at the AP usually means a longer duration of the 2-nd contention round, which increases the chances of collisions in the 2-nd round. For example, in a case that the AP employs 2 antennas, the 2-nd contention round finishes as soon as a STA successfully wins the still-available antenna of the AP; while in a case that the AP employs more than 2 antennas, the 2-nd contention round continues, therefore increasing the 2-nd round collision probability.

Fig. 16
figure 16

2-nd round collision probability against \(M\)

5 Conclusions and future research challenges

In this paper, a unified MU-MIMO MAC protocol called Uni-MUMAC, which supports both MU-MIMO downlink and uplink transmissions for IEEE 802.11ac WLANs, is proposed. We evaluate it through an analytic model and simulations. A prominent MAC scheme from the literature is implemented and compared with Uni-MUMAC.

By analyzing the simulation results, we observe that the 2-nd round Contention Window CW2nd, which is tuned to optimize the uplink transmission, is however not bringing the same benefit to the downlink one. An adaptive frame aggregation scheme and a queue scheme are applied at the AP to offset this disadvantage. By properly setting the aforementioned parameters, the results show that a WLAN implementing Uni-MUMAC is able to avoid the AP bottle-neck problem and performs very well in both the traditional downlink-dominant and emerging down/up-link balanced traffic scenarios. The results also show that a higher system capacity can be achieved by employing more antennas at the AP.

Uni-MUMAC gives us insight about the interaction of down/up-link transmissions and how different parameters that control the system can be tuned to achieve the maximum performance. Based on the study of this paper, we considered the following aspects as the future research challenges or next steps for Uni-MUMAC.

  1. 1.

    Adaptive scheduling scheme: As discussed in the paper, a parameter that optimizes the uplink could be unfavorable to the downlink. Therefore, an adaptive scheduling algorithm that takes several key parameters into account and compensates those STAs whose interests are harmed would play a significant role on obtaining the maximum performance while maintaining the fairness. As implied from the results, these parameters include: the size of A-MPDU, the queue length, the spatial-stream/frame allocation, the number of nodes/antennas, and other key parameters that control down/up-link transmissions.

  2. 2.

    Traffic differentiation: Another future research challenge is to provide new traffic differentiation capability in the uplink in addition to the one defined in IEEE 802.11e amendment [32]. The new traffic differentiation should be able to limit the number of STAs that can participate in the 2-nd contention round to reduce 2-nd round collisions. A possible solution could be to create a table at the AP with information about the priority of each traffic flow and the queue length of each STA, and then to utilize this table to control the 2-nd contention round.

  3. 3.

    Multi-hop mesh networks: There are more challenges that need to be considered in designing MAC to operate in multi-hop wireless networks. First, the hidden-node problem. It is still an open challenge to find mechanisms that efficiently solve the collisions caused by hidden nodes. A collision-free scheme proposed in [33] or the handshake based coordinated access could be a starting point to combat the hidden-node collisions in wireless mesh networks. Secondly, due to the heterogeneity of mesh nodes (e.g., different number of antennas at nodes), MAC protocols for wireless mesh networks need to be designed with the capability of swiftly switching among MU-MIMO, SU-MIMO, multi-packet and single-packet transmission schemes. Thirdly, MAC and routing protocols need to be jointly designed. There could be multiple destinations involved in a MU-MIMO transmission, and some destinations could be out of the one-hop transmitting range, in which case, routing strategies should be able to forward multiple packets to different nodes in parallel.