1 Introduction

By 2019, 80% of all Internet traffic is expected to be consisting of consumer video applications [1]. This vast amount of video traffic will be mostly delivered over wireless links including omnipresent IEEE 802.11 based Wireless Fidelity (WiFi) devices, thanks to the proliferation of smart phones and tablets. New high speed WiFi technologies, such as IEEE802.11n and IEEE802.11ac can provide throughput improvements by mainly leveraging Multiple Input Multiple Output (MIMO) techniques and bandwidth expansion at the Physical (PHY) layer. However, MIMO links exhibit significant rate fluctuations that should be taken into account when scheduling a transmission. Moreover, higher data rates available via MIMO cannot be efficiently utilized due to contention overhead. Therefore, efficient channel access methods at the Medium Access (MAC) layer have been proposed to further improve throughput [2].

Frame aggregation, introduced in IEEE 802.11n [2], makes better use of the high speed channel by allowing back-to-back contention-free transmissions during a transmission opportunity (TXOP), amortizing various overheads over multiple frames. There are two types of aggregation specified by the IEEE802.11 standard, namely, Aggregate MAC Service Data Unit (A-MSDU) and Aggregate MAC Protocol Data Unit (A-MPDU). In A-MSDU, multiple link layer payloads are aggregated before being processed by the MAC layer. On the other hand, A-MPDU refers to aggregation of bona-fide MAC layer frames with their individual MAC headers below the MAC layer, before being handed to the PHY layer for transmission.

A-MSDU has a smaller header overhead, but it potentially suffers from high frame error rate, due to sharing of the same checksum for all aggregated frames. This limits A-MSDU to a small size. A-MPDU, on the other hand, keeps a separate MAC header for each frame along with its independent checksum and only shares PHY headers and contention overhead. Given the large size of aggregated frames required, we only consider A-MPDU in this paper.

1.1 Motivation and contributions

The potential performance improvements of using aggregation are best realized for high throughput non-interactive real-time applications, such as live/stored video streaming. Interactive applications, on the other hand, are often constrained by small delays, allowing limited aggregation. According to the ITU recommendation G.114, video and voice conferencing applications should have network latency limited to 150 ms for good user experience, [3]. Even for a high quality video conference at 1 Mbps, this is equivalent to 18.75 KB of data per transmission, resulting in an aggregation size of roughly 12 maximum size frames. However, if this were a video streaming application with a delay requirement of 5 s,Footnote 1 then each transmission could accommodate up to 725 KB equivalent to 415 aggregated frames. Moreover, the benefit of aggregation on the uplink direction is generally limited, even for applications such as video streaming, due to the small amount of traffic that flows in the uplink direction. Hence, without loss of generality, in this paper, we consider aggregation in the downlink direction. In order to obtain the best performance, the channel conditions should be taken into account while setting the aggregation size. For instance, if the bit-rate is low, a large TXOP, i.e., a large aggregation size will waste valuable wireless resources. Likewise, transmission of small packets at high data rates will under-utilize the high speed links.

In this article, we use the effective capacity theory [4] for appropriately setting the size of a TXOP over a fluctuating channel. If the size of a TXOP is adjusted such that the effective capacity provided to a link is as large as its traffic arrival rate, then a certain delay violation probability can be guaranteed. The direct use of effective capacity theory can be computationally difficult, as it requires the knowledge of the channel service process; however, we apply an approximation to derive a simpler surrogate problem, which considers only queue level metrics. Despite being non-convex, this problem is solved using a Proportional–Integral–Derivative (PID) controller. The main contributions of our work and characteristics of our solution can be listed as follows:

  1. (1)

    A frame aggregation algorithm based on the theory of effective capacity is proposed to provide statistical delay guarantees over time varying wireless channels of high speed IEEE 802.11 WLANs.

  2. (2)

    A novel metric, which quantitatively captures the amount of surplus resource provisioning over a wireless link given a certain statistical delay guarantee, is proposed, and it is then utilized in devising the frame aggregation algorithm.

  3. (3)

    Analytical results are provided for showing that the aggregation size obtained by the proposed algorithm is very close to the value derived from the queuing model approximation of the link.

  4. (4)

    The proposed aggregation scheme requires no explicit channel state information, but it only relies on queue level metrics that are simply measured at the MAC layer.

  5. (5)

    The proposed aggregation scheme is entirely implemented at the Access Point (AP) side, and it operates over the software of commodity WiFi devices without modifying the standard. This is demonstrated by implementing the proposed aggregation scheme over an IEEE802.11 AP.

  6. (6)

    The proposed aggregation scheme is shown to improve the capacity and delay performance for both Constant Bit Rate (CBR) traffic and Variable Bit Rate (VBR) video streaming traffic over IEEE 802.11 WLANs compared to prominent scheduling schemes, as demonstrated by NS-3 simulations and experiments on a real test bed. The proposed aggregation method is also shown to enhance the video quality in terms of Peak Signal to Noise Ratio (PSNR), especially when the number of users is increased, as observed in the simulation and tests results with a real, HD MPEG4 video streaming application.

1.2 Related work

Frame aggregation is a widely used technique over wireless networks. In [5] aggregation is used to simply improve throughput over WLANs. On the other hand, [6] uses frame aggregation over multi-hop WiFi wireless mesh networks to compensate for the accumulated delay of VoIP packets traversing the network. A rather different use of aggregation is proposed in [7, 11] for millimeter-wave directional wireless networks. In [7], it is shown that by using the maximum aggregation size, and reducing the number of channel accesses, such networks will experience considerably less deafness-induced collision rate. In [8], considering frame aggregation and bidirectional transmission, an analytical model is developed for performance study of IEEE 802.11n for voice and video applications. Two new aggregation schemes are also proposed. It is shown that MAC-level aggregation significantly improves voice and video capacity. In [9] a multi-QoS aggregation scheme is proposed for fairness among the Access Categories(ACs), in which an aggregated frame comprises packets of different ACs. In [10] an adaptive A-MPDU aggregation scheme for VoIP application is proposed. In the proposed scheme the end-to-end delay of the head of line packet is estimated and while it is less than the 150 ms end to end delay of VoIP, the sender waits for more packets. When the estimated end-to-end delay of the head of line packet becomes equal to 150 ms it aggregates all queued packets and transmits. Interestingly, our DEADLINE approach is very similar to the aforementioned paper but considers a larger delay for video streaming.

When aggregation is considered, the common practice is to set the TXOP, so that all outstanding packets for a link are transmitted [12,13,14]. Furthermore, in [15], the next TXOP is allocated to be large enough to consume both outstanding and newly arriving packets by predicting the traffic generated by a VBR video source with long range dependency. Although this aggregation policy is very simple and it can result in low delay under light load, it can also exhibit poor channel efficiency, as it potentially causes a large number of small size transmissions, which can significantly increase delay or reduce capacity when the channel is operating close to saturation.

In addition, [16] identifies the problem of re-transmitted sub-frames within an A-MPDU that will result in degraded throughput due to fragmentation of A-MPDU and fluctuation in its size. A two-level aggregation scheme is then proposed that compensates for the total aggregated frame size by cleverly using A-MSDU aggregation according to channel Bit Error Rate (BER) conditions. This work; however, neither considers QoS nor takes into account variations in channel state, and it is only concerned with improving the throughput.

The throughput impact of frame aggregation on TCP connections for various access categories of IEEE802.11ac is considered in [17]. The optimal aggregation sizes are then derived for the uplink and downlink directions using a Markov model of the link. However, only a single link network is considered with no channel variations. Alternatively, [18] proposes a completely new MAC and PHY mechanism that allows aggregation of multiple frames to different destinations into a single transmission. This approach outperforms IEEE802.11 aggregation both with respect to throughput and delay, and it is applicable to dense hot-spots; however, it is not compatible with the current IEEE802.11 technology.

Hybrid Coordination Function (HCF) Controlled Channel Access (HCCA) mode of IEEE802.11n has been used in [14, 19] to perform TXOP allocation. Specifically, [19] allocates TXOPs according to a token bucket, matching the VBR traffic specifications; whereas [14] adjusts TXOP based on queue leftover. An earliest deadline first scheduler is also used by [19] to decide which station’s packets should be transmitted. Our approach, on the other hand, works with Enhanced Distributed Channel Access (EDCA) which, unlike HCCA, is deployed by all commercial 802.11 devices.

A few works have considered the link quality variations in making aggregation decisions [20,21,22]. For example, in [20], TXOP is increased for links with low packet error rate (PER) and it is decreased otherwise. This scheme is purely heuristic with no explicit consideration of delay, and it works best if the channel changes very slowly.

Recently, the authors in [21] have proposed a channel scheduling approach that takes into account rate fluctuations as a result of a slowly fading channel in an LTE system. Users are allocated channel times proportional to the ratio of their current instantaneous rate to their average rate. The scheme requires knowledge of instantaneous channel rate, and delay QoS is not considered at all. In [22] another TXOP allocation scheme considering channel variations is proposed, where a lead–lag compensator borrows TXOPs from bad links and assigns them to those with good channels. A mathematical proof of long term fairness and a bound on the amount of TXOP difference between any two links are provided; however, QoS is not considered.

The authors in [23] propose a scheme to adaptively tune frame aggregation size according to channel conditions. The objective is to reduce the number of re-transmissions due to channel error and collisions that is exacerbated with larger aggregation sizes. The goal of the proposed method, named AFLAS, is to merely improve throughput. Moreover, [24] uses channel state information to adaptively select the proper aggregation size and MIMO mode for IEEE 802.11ac links with the objective of improving throughput. Although the approach considers a time-varying channel, it requires channel state information to make decisions, unlike our proposed scheme which does not look into PHY level information, but takes queue level information into account. In addition, we consider statistical QoS guarantee which is not considered in [24].

A recent study in [25] has verified the accuracy of effective capacity applied to IEEE802.11 networks. Our proposed aggregation algorithm is based on the concept of effective capacity, so that a statistical delay QoS, in the form of a target delay violation probability for a given delay bound is guaranteed. Our algorithm achieves this by only monitoring several queue level performance metrics. Therefore, channel conditions and variations are considered without requiring explicit channel state information. Our scheme not only provides statistical QoS guarantees, but it is much simpler to implement, when compared to the existing works.

Most of the previous works do not explicitly consider delay with channel variations. Only recently in [26], an aggregation algorithm that explicitly considers packet deadlines is proposed. In that work, the aggregation size is determined such that the aggregate frame transmission time does not exceed the amount of time remaining until the deadline of the head-of-line (HoL) packet. Annese et al. [27] proposes a control theoretic approach for providing delay guarantee over HCCA.

In our previous work, [28], we have provided the preliminary results for a new aggregation algorithm based on a PID controller, which adaptively changes the aggregation size according to channel statistics as viewed through link level metrics. In the current paper, we improve [28] by additionally considering performance of video traffic and also by providing results obtained from actual implementation of our approach over a testbed. Moreover, we also provide an analytical solution to the frame aggregation problem which supports our simulations.

1.3 Paper organization

The rest of the paper is organized as follows: Sect. 2 provides the background on effective capacity, explaining how it can be employed as a framework for statistical QoS guarantees and how queue level metrics can be related to the effective capacity of a link. Section 3 describes our system model, the optimization problem for downlink resource allocation, i.e., aggregation, with statistical QoS and the queuing model approximations. Section 4 presents our proposed practical aggregation algorithm, which is based on a PID controller. This is followed by Sect. 5, which presents our simulation results, and Sect. 5.2, which presents our testbed implementation and results. The paper is concluded in Sect. 6.

2 Effective capacity link model

In their seminal work, Wu and Negi [4] show that for a stationary Markov fading channel with stochastic service rate and input traffic of \(\lambda\), the effective capacity of the channel is,

$$\begin{aligned} EC(\theta ,\lambda ) = \Gamma (\theta ) = -\lim _{t \rightarrow \infty }\frac{\lambda }{\theta t} \log {\mathsf {E}}\left[ e^{-\frac{\theta }{\lambda }C(t)} \right] , \end{aligned}$$
(1)

where C(t) is the cumulative random service process of the channel and \(\theta\) is the so-called “QoS index” reflecting the strictness of the delay guarantee required. Consider a queue of infinite size receiving data with rate \(\lambda\) and being served according to this service process C(t). It has been shown in [29] that the probability of the queue length Q(t) exceeding some threshold \(B_{max}\) is approximated by,

$$\begin{aligned} \sup _{{\mathsf {t}}} {\mathsf {Pr}}\{Q(t) \ge B_{max} \} \approx \gamma (\lambda ) e^{-\frac{\theta (\lambda )}{\lambda }B_{max}}, \end{aligned}$$
(2)

where according to [29], \(\gamma (\lambda ) = {\mathsf {Pr}}\{Q(t) > 0\}\) is the probability that the queue is non-empty and \(\theta (\lambda )\) is the solution to \(\Gamma (\theta )=\lambda\), i.e.,

$$\begin{aligned} \theta (\lambda ) = \Gamma ^{-1}(\lambda ). \end{aligned}$$
(3)

This approximation is shown to be valid for moderate and large values of \(B_{max}\) and for VBR and CBR traffic [4]. If, alternatively, the delay experienced by some packet arriving at time t to the queue is of interest then the following approximation can be used,Footnote 2

$$\begin{aligned} {\mathsf {Pr}}\{D(t) \ge D_{max}\} \approx \gamma (\lambda ) e^{-\theta (\lambda )D_{max}}. \end{aligned}$$
(4)

Recall that \(\gamma (\lambda )\) is the link utilization and \(\theta (\lambda )\) is called the QoS-index of the link. For a given statistical delay bound \(D_{max}\), QoS-index determines the steepness of the probability, \(\varepsilon\), by which the delay bound is violated. Hence, the tuple \(\{\gamma (\lambda ),\theta (\lambda )\}\) entirely determines a statistical QoS guarantee, in the form of a maximum delay, and a delay violation probability, \(\{D_{max},\varepsilon \}\), for such a link.

Calculating the QoS index via taking the inverse of \(\Gamma (.)\), however, requires the knowledge of the channel service distribution C(t). Such information may not be always available, and even if it is, obtaining \(\Gamma (.)\) in closed form is generally not possible. Therefore, it is suggested in [30] to use the following approximation for obtaining \(\theta\), which becomes accurate if (4) is satisfied with equality:

$$\begin{aligned} {\widehat{\theta }} = \frac{\gamma \lambda }{\gamma \lambda {\bar{S}} + {\bar{Q}}}. \end{aligned}$$
(5)

Here, \({\bar{S}}\) is the average residual service time of the packet currently being transmitted as sampled by an arbitrary packet arrival, and \({\bar{Q}}\) is the average queue length. The accuracy of the \({\widehat{\theta }}\) estimate has been recently verified in [25].

We solve (4) for \(\theta\) by requiring the delay violation probability to be less than or equal to a target value \(\varepsilon\), i.e., \({\mathsf {Pr}}\{D(t) \ge D_{max}\} \le \varepsilon\). Hence, we obtain

$$\begin{aligned} \theta \ge -\frac{\log (\varepsilon /\gamma )}{D_{max}}. \end{aligned}$$
(6)

Consequently, from (5) and (6) we come to the conclusion that, for a given statistical QoS requirement in the form of a delay bound and a target delay violation probability, \(\{D_{max},\varepsilon \}\), and for traffic arrival rate of \(\lambda\), the following condition should be met,

$$\begin{aligned} \frac{\gamma \lambda }{\gamma \lambda {\bar{S}} + {\bar{Q}}} + \frac{\log (\varepsilon /\gamma )}{D_{max}} \ge 0. \end{aligned}$$
(7)

3 Effective capacity based aggregation

3.1 System and channel model

We consider a single IEEE802.11 Basic Service Set (BSS) with a number of stations (STA) and an AP forwarding traffic to the stations. Frames to each STA are en-queued at the AP and they are sent in aggregates, the size of which is determined on a beacon by beacon basis by the AP according to our proposed aggregation scheme. It should be noted that, any other traffic flowing in the uplink/downlink direction which does not fall into the proposed aggregation scheme can co-exist along with the designated traffic and be sent using the default aggregation policy adopted by commodity 802.11 interfaces. As a result it is not considered in our system model.

We consider a slow fading wireless channel modeled using a multi-state Discrete Time Markov Chain (DTMC) [31]. Each MIMO sub-channel is modeled using an independent DTMC with states corresponding to non-overlapping Signal-to-Noise-Ratio (SNR) regions. Each SNR region corresponds to a certain best modulation and coding scheme (MCS) for that MIMO stream, which is supposedly selected by the link adaptation algorithm. The assumed link adaptation algorithm is based on the behavior of IEEE802.11ac cards that use the same MCS across all MIMO streams, as suggested by the standard [32], which is applied in IEEE802.11 products. The current slow fading channel model and the link adaptation algorithm are selected to reflect a practical setting; however, it should be noted that our scheme can operate for any other channel model and link adaptation algorithm. Lastly, we assume no packet errors in our simulations. However in our testbed scenario where packet errors exist, their effect will be implicitly taken into account via channel service process C(t), appearing as larger average queue length and service times.

3.2 Optimization problem

Consider an arbitrary downlink \(l,~(l=1, \ldots , L)\) with a given QoS requirement \(\{D_l,\varepsilon _l\}\). This link is to be assigned a time allowance \(\tau _l\) during each beacon interval of duration, BI. The link may consume its time allowance in a single or multiple TXOPs depending on the limitations set by the AP. Our objective is to have the AP schedule a given number L of down-links at minimum resource usage, i.e., total time allowance, while QoS is satisfied using (7) as the constraint. This approach leaves maximum room for newly arriving connections as well as best-effort traffic, effectively maximizing the system capacity. Our problem is expressed as follows

$$\begin{aligned}&\min \sum _{l=1}^{L} \tau _l \end{aligned}$$
(8)
$$\begin{aligned}&subject~to \nonumber \\&\frac{\gamma _l(\tau _l) \lambda _l}{\gamma _l \lambda _l \bar{S_l}(\tau _l) + \bar{Q_l}(\tau _l)} + \frac{\log (\varepsilon _l/\gamma _l(\tau _l))}{D_l} \ge 0;\forall l=1\ldots L~~~~~ \end{aligned}$$
(9)
$$\begin{aligned}&\sum _{l=1}^{L} \tau _l \le BI. \end{aligned}$$
(10)

Table 1 enlists all notations and symbols used in this paper. Note that, each link is in fact a G/G/1 queue, for which \({\bar{Q}}(\tau )\) and \(\bar{S(\tau )}\) are in general non-convex functions of time allowance \(\tau\). More importantly, these quantities do not have a closed form in general, which makes solving the optimization problem (8)–(10) impractical. Next, we present an approximation to this problem, which then leads up to a practical solution, i.e., the proposed PID based control algorithm.

Table 1 Notations

3.3 Queueing model approximations

Considering the wireless link, l, where the channel rate takes values from \(r_1<r_2<\cdots <r_K\), we assume that the steady-state probability of the channel rate, \(r_k, k=1 \ldots K\) is denoted by \(\pi _{lk}\). We also make the simplifying assumption that if l is allocated a TXOP of length \(\tau _l\) during the beacon interval of length BI, then it will transmit with the same rate for the entire TXOP. Approximating the system as an M/G/1 queue, the CDF of the residual service time is obtained as

$$\begin{aligned} \Psi _l (t) = {\mathsf {Pr}}(S_l \le t) = \mu _l \int _{0}^{t} \left( 1-B_l(x)\right) dx. \end{aligned}$$
(11)

Here, \(\mu _l=\frac{1}{{\mathsf {E}}[T_l]}\) is the average packet service rate of link l and the rest of the parameters are defined as in Table 1. Applying the M/G/1 analysis to the problem as in the “Appendix”, it is sufficient to find the TXOP allocated to each link, \(\tau _l\), such that,

$$\begin{aligned} \sum _{n=0}^{\infty } \gamma _l^n(\tau _l) \Psi _l^{(n+1)} (D_{l,max};\tau _l) \ge \frac{1-\varepsilon _l}{1-\gamma _l(\tau _l)}. \end{aligned}$$
(12)

Unfortunately, (12) does not have a closed form solution and even a numerical solution is very difficult to obtain. However, a closed form solution can be obtained if the link model is approximated as M/M/1. It is well known that the tail of the M/M/1 queueing delay distribution is,

$$\begin{aligned} {\mathsf {Pr}}(D>D_{l,max}) = e^{-(\mu _l - \lambda _l)D_{l,max}} \end{aligned}$$
(13)

Hence, by constraining the tail distribution to be less than the target delay violation probability, \(\varepsilon _l\), and replacing \(\mu _l\) with (23), we derive the approximate solution to the minimum required TXOP as in (14) (Please refer to Table 1 for a complete list of notations.)

$$\begin{aligned} \tau _l = \frac{M \cdot BI(\lambda _l-\frac{ln\varepsilon _l}{D_{l,max}})}{\sum _k \pi _{lk}r_k}. \end{aligned}$$
(14)

In Sect. 5, we compare the approximate analytical TXOP allocation solution given by (14) to that of the practical solution for the optimization problem in (8)–(10). Let us emphasize that (7) does not require channel state information as it entirely relies on link level metrics such as average queue length and service time as well as link utilization. Essentially, (7) liberates us from the burden of using complex channel state information and the derivation of channel service process C(t). The only place where we do need to compute these probabilities is for comparison of our model with simulations. In that case, we obtain these probabilities from experiments run over our testbed recording how often each bitrate is selected. We then use these values in our model and also initialize our fading Markov MIMO channel model of the simulation environment to produce such bitrate distribution.

4 A practical control algorithm

We propose a practically implementable solution to the problem in (8) by designing a Proportional–Integral–Derivative (PID) controller.

A PID controller is very simple to implement and also straightforward to understand, as it simply considers the error function compared to a certain objective and makes use of proportional, derivative and integral terms to achieve the optimal point. The proportional term controls the response time of the controller, while the derivative term mainly decreases overshoot and improves stability. The integral term, on the other hand, is used to reduce steady state error and improve the long term response of the system.

Hence, the PID control approach has been chosen due to its simplicity, ease of parameter selection as well as wide usage in the context of wireless scheduling [33,34,35,36]. In particular, we propose a novel metric that is interpreted as the surplus resource provisioning, which is then easily monitored and controlled within the framework of a PID controller to yield the optimum resource provisioning. Note that constraint (10) can be relaxed to obtain an independent optimization problem for each link, l, where it suffices to solve

$$\begin{aligned}&\min ~\tau _l \end{aligned}$$
(15)
$$\begin{aligned}&\frac{\gamma _l(\tau _l) \lambda _l}{\gamma _l \lambda _l S_l(\tau _l) + Q_l(\tau _l)} + \frac{\log (\varepsilon _l/\gamma _l(\lambda _l))}{D_l} \ge 0. \end{aligned}$$
(16)

An objective value larger than BI implies that the solution is infeasible; in which case, each link will have its time allowance \(\tau _l\) re-scaled for the solution to become feasible. This, however, results in graceful QoS degradation for all links. This scaling is simply performed at the AP for all downstream traffic. Note that the vast majority of high bit-rate traffic is in the downstream direction. In the event that there are high bit-rate upstream traffic, then the scaling can still be easily achieved as follows: stations will use whatever time allowance they compute without applying any scaling. This could result in one round of transmission to take more than a beacon interval. Such an event can be readily detected by the AP and the amount of violation also captured. The AP will then compute the required scaling factor and advertise it in its next beacon frame forcing all stations to apply it to their time allowance.

Observe that as the constraint (16) is increased, the delay violation probability becomes much smaller than the target \(\varepsilon _l\), hence we can interpret the quantity

$$\begin{aligned} \beta _l(t)= \frac{\gamma _l(\tau _l) \lambda _l}{\gamma _l \lambda _l S_l(\tau _l) + Q_l(\tau _l)} + \frac{\log (\varepsilon _l/\gamma _l(\lambda _l))}{D_l}, \end{aligned}$$
(17)

as the measure of link QoS provisioning, with \(\beta <0\) and \(\beta >0\) representing under-provisioned and over-provisioned links, respectively. Clearly, \(\beta =0\) is the best choice. In fact, we claim that at optimality, all constraints will be binding. This is justified by observing that the cost of over-provisioning any link is strictly positive. Therefore, due to complementary slackness, the constraints should be satisfied with equality. It follows that keeping track of the error of \(\beta\) is a suitable candidate for building a PID based controller.

The value of \(\beta\) is independently computed for each link, l. A positive value for \(\beta\) on a given link implies that the link is over-provisioned, i.e., more TXOP is allocated to it than required. Likewise, a negative \(\beta\) implies that the link requires larger TXOP allocated to it. The PID controller uses this measure to tune the allocated TXOP to all given links appropriately.

Fig. 1
figure 1

PID controller for scheduling each link

Figure 1 shows a diagram of the PID controller for a given link. The AP constantly monitors \(\beta _l\) for each link at the end of each beacon interval. This is done by measuring for each link, its traffic arrival rate \(\lambda _l\), the probability of having a non-empty link queue \(\gamma _l\), its average queue length \(Q_l\), and average packet service time \(S_l\) and then using (17) to compute \(\beta _l\). It should be noted that our scheme can be entirely implemented on the AP side, for determining the aggregation size, without any modifications to the IEEE802.11 MAC functions. In fact, we have already implemented it over commodity APs and reported the results in Sect. 5.2.

Eventually, the error value \(e_l=\beta _l\), which is the surplus resource provisioning itself, is simply computed for all links. At this point, the transmitter updates the time allowance of each link for the next beacon (control) interval according to (18) using the PID control law by applying appropriate gains \(k_p,k_i,k_d\) to the error, its cumulative sum and its rate of change, respectively.

$$\begin{aligned} \tau _l(t+1) = \tau _l(t) - \left( k_p e_l + k_i \sum _{u=t-T}^{t} e_l(u) + k_d \frac{e_l(t)-e_l(t-1)}{BI} \right) \end{aligned}$$
(18)

If necessary, \(\tau _l\) is then scaled to fit within a beacon interval of length BI. That is, if \(\sum _l \tau _l > BI\) then it will be set to

$$\begin{aligned} \tau _l \leftarrow \frac{\tau _l}{\sum _l \tau _l}. \end{aligned}$$
(19)

A particular amount of \(\tau _l\) is then enforced by appropriate tuning of the length of a given TXOP. Moreover, the surplus resource provisioning index \(\beta _l\), is also used to determine the order in which links will be served. In particular, we select the link with most negative \(\beta _l\) as it corresponds to the one which is most under-provisioned. Also note that if re-scaling of \(\tau _l\) happens too often, then it means that the system of links is infeasible. This can be prevented by an admission control mechanism, which is not practically employed in IEEE802.11 networks as connections are allowed to contend, leaving it up to the applications to deal with the resulting delay or packet loss. For example, in a video application such as YouTube, the client may request a lower quality video to be streamed. Hence, the proposed aggregation algorithm continues to serve links by re-scaling their time allowance, and each link may experience some level of QoS deterioration. Last but not least, the \(\theta\) bound in (6) is not tight, since it is based on the Chernoff inequality. Hence, there is some amount of intrinsic over-provisioning in our aggregator that will compensate for cases requiring re-scaling of \(\tau _l\). We have observed this behavior in our simulations.

Algorithm 1 outlines a pseudo code of the proposed PID aggregation scheme. The time allowance for each station is stored in the vector \(\tau\) which is computed once at the beginning of each beacon interval (lines 9–14). First, the provisioning metric \(\beta\) for a particular station is computed as in step 10. This is followed by applying the PID law to the station and obtaining the time allowance in step 12. When the time allowance for all stations is computed, step 14 then normalizes the vector in case it exceeds the beacon interval. The normalization function is a simple one which is described in steps 1–7. Having computed the vector \(\tau\), the AP will consequently start transmitting aggregated frames for stations once the driver declares it is ready to handle the next frame (steps 15–18). The AP picks the station with the largest unused time allowance in step 15 and computes the required size of the aggregated A-MPDU frame based on the remaining time allowance in step 16. The remaining time allowance is then updated in step 7 by deducting the transmission time for the formed A-MPDU. This process continues until there are no stations for which their time allowance is not granted. Lastly, all statistics required for the correct functioning of the PID aggregation scheme including average queue size, utilization and packet service time are updated upon every new packet arrival from the higher layers as well as receiving of a block ACK which marks the end of a transmission transaction (step 19).

figure g

5 Performance evaluation

We have compared the performance of our proposed PID-based aggregation algorithm with two prominent scheduling schemes from the literature, namely, Earliest Deadline First (EDF) with maximum aggregation size similar to [14, 19] and the deadline based aggregation in [26], denoted as “Deadline”. In the EDF scheme, the AP always serves the station for which its HoL packet has the earliest deadline and aggregates as many packets as available. In the Deadline scheme, the AP only aggregates those packets, which will violate their deadline until the next beacon interval. The rationale behind the Deadline scheme is to postpone packets as much as possible, so that the largest aggregation size can be achieved for best channel efficiency.

In the following subsections, we present the performance of proposed aggregation algorithm, in comparison with EDF and Deadline aggregation schemes via simulations in NS-3 environment and tests on a real implementation and set-up. We also present the validation of the PID based solution, by comparing simulation results to that of the approximate analytical solution.

5.1 Simulations

In the NS-3 simulation environment,Footnote 3 we consider a single IEEE802.11n BSS containing one AP and multiple stations all receiving traffic from the AP in the downlink direction. The simulation environment models the PHY and MAC layers at packet level. The NS-3 default propagation environment is based on the YansWiFi model. YansWifiPhy is the implementation of the IEEE 802.11 physical layer, which models an Additive White Gaussian Noise Channel (AWGN) with cumulative noise. On top of this, we have developed a Markov model of a MIMO fading channel at the PHY layer and incorporated that into NS-3. At the MAC layer, we make use of the IEEE 802.11e EDCA MAC algorithm. Packet errors may occur due to collision or interference, and will be detected by the lack of ACK, triggering a retransmission. We have assumed a retry limit of three as used by default in many WiFi cards. Our aggregation scheme is implemented, as shown in Fig. 2. We have created a new queueing and service policy that considers each link separately. The aggregation component is also implemented in a generic manner so that it can accommodate any particular aggregation algorithm along with a generic controller for adapting its parameters.

We have taken an empirical approach in setting the coefficients of the PID controller. We ran many simulations and found the acceptable range of parameters. We found that moderate perturbation of these coefficients has negligible effect on the steady state performance of the system, while it does change the transient behavior. In particular, important measures such as delay violation probability, channel utilization and average time allowance allocated to each station remain unchanged over a long interval. We have taken the following generally recommended guidelines for the tuning of PID parameters [37]:

  1. (1)

    Set the derivative and integral coefficients to zero and gradually increase the proportional term until time allowance starts to oscillate rapidly. This step ensures proper gain to reduce PID response time and quickly achieve appropriate range of time allowance.

  2. (2)

    Gradually increase the derivative term to reduce oscillations.

  3. (3)

    If there is still some target delay violation above the acceptable threshold, then add slightly to the integral term to ensure that the system performs with the required time allowance in the long run. The integral coefficient was not required in any of our simulations, however.

Once these parameters are obtained they are relatively insensitive to change of delay bound and target delay violation probability as well as number of stations.

Fig. 2
figure 2

Implementation of the aggregation algorithm in ns-3

We have simulated two scenarios, considering CBR traffic and video traffic from an actual video trace. For CBR traffic performance, we evaluate the channel utilization, average end-to-end delay, delay violation probability and the amount of time allowance allocated to each station. For video traffic, the widely used Peak Signal to Noise Ratio (PSNR) metric is computed over the received frames by comparing them with the transmitted ones. Frames arriving after the deadline are considered lost by the video player on the receiving client, resulting in a degradation in PSNR. Table 2 lists our main simulation parameters, which are used by default, unless noted otherwise.

Table 2 Simulation parameters

Figure 3 shows performance results for channel utilization, delay violation probability and average delay for our proposed PID scheme, EDF and Deadline approaches. The number of stations is varied from 1 to 10 and the target delay violation probability is set to 1%. According to Fig. 3a, our proposed PID aggregation makes the most efficient use of channel. In particular, when there are four stations, PID is able to satisfy the required delay violation probability, while keeping channel utilization close to 50%, whereas, EDF and Deadline utilize more channel time at, 80% and 75%, respectively. More importantly, PID can maintain up to 8 stations with a maximum utilization of 98%, while EDF and Deadline saturate at almost 100% and 95%, respectively with 7 stations. The lower utilization of Deadline is due to its large queue build up at the AP, which results in packet loss.

In addition, Fig. 3b supports these results showing that PID maintains the target DVP of 1% (horizontal line) for up to 8 stations, while EDF and Deadline only make it up to 6 and 5 stations, respectively. This result suggests that our proposed approach provides roughly 30% higher capacity over EDF. This is due to better channel utilization, as previously illustrated in Fig. 3a.

Fig. 3
figure 3

Comparing performance of PID, EDF and deadline based aggregation algorithms: a channel utilization, b delay violation probability and c average delay

Comparing the average delay of PID, EDF and Deadline as in Fig. 3c provides a valuable insight. While Deadline based aggregation maintains an average delay just below the delay bound of 5 s for up to 6 stations, it quickly becomes large, deteriorating the delay violation probability for more than 6 stations. EDF is on the other side of the spectrum, providing very small delay as long as the number of stations are small. This suggests that EDF is over-provisioning, which explains its slightly lower channel efficiency as compared to Deadline (see Fig. 3a). PID, however, maintains an average delay between that of Deadline and EDF, but it is still far below the target delay bound, which is acceptable.

Note that our approach is of interest to high bit-rate traffic with delay bound requirements. The majority of such type of traffic is video streaming, which has a relatively large delay bound, whereas video streaming traffic may experience a latency of 4–5 s. Nevertheless, to show the performance of our approach for applications with shorter delay bound we have included results for a delay bound of 1 s in Fig. 4. The figure clearly shows that PID aggregation can maintain up to 7 stations while Deadline and EDF approaches can provide for 5 and 6 stations, respectively. Note that the maximum number of stations handled by PID has dropped from 8 to 7 due to the shorter delay requirement, however.

Fig. 4
figure 4

Performance comparison of PID control, deadline and EDF for delay bound of 1 s. a channel utilization, b average delay and c DVP

Fig. 5
figure 5

The effect of delay bound: a channel utilization and b average time allowance allocated to each station during a beacon interval

Let us now consider how our PID aggregation algorithm performs when the delay requirement is changed as shown in Fig. 5. According to Fig. 5a, relaxing the delay bound improves channel utilization, but this improvement saturates after some certain delay value. These results are backed by Fig. 5b showing average time allowance allocated per station for different delay bounds. Clearly, the improvement in channel utilization is due to allocating less time allowance for larger delay bounds. For example, when there is only one station the time allowance of each station reduces from 15.38 to 11.7 ms, when the delay bound is varied between 1 and 20 s. This is a reduction of almost 25%, which can be a significant improvement.

These results, in particular, show how PID aggregator is able to delicately relate delay bound to the amount of resource allocated to a link. The improvement; however, diminishes with larger number of stations. For instance, when there are six stations, channel utilization can be lowered by 11%, when delay bound is changed from 1 to 20 s. This is because the intrinsic over-provisioning that is provided by effective capacity, diminishes as the number of stations becomes large.Footnote 4

Table 3 Performance comparison for scenarios including applications with different delay requirements with target DVP of 0.01
Table 4 The effect of target delay violation probability on time allowance

To show that our approach is able to differentiate between stations with different delay requirements, we have also studied the case where stations have heterogeneous delay requirements. Table 3 presents results for when stations (applications) with mixed delay requirement of 1, 4 and 7 s all coexist in the same WLAN. In this scenario there are a total of six stations with two stations for each delay bound. Table 3 shows that our proposed PID solution allocates different time allowances to each class of station depending on their delay requirement while maintaining the DVP well below the target.

Unlike the effect of delay bound, relaxing the target delay violation probability provides less improvement to channel utilization. Table 4 shows that for a fixed delay bound of 5 s, the time allowance per station merely reduces by 1.5 ms when \(\varepsilon\) changes from 0.1 to 20%. The following interesting conclusion can be drawn by observing the effect of delay bound and delay violation probability on channel utilization: It is more resource costly to achieve a target delay bound than it is to achieve it with precision. Likewise, it is simpler to provide very tight delay guarantee than it is to reduce that delay bound.

Table 5 The effect of heterogeneous source rates on time allowance

We next show how our PID aggregator differentiates between resource allocated to stations based on their source traffic demand. The results of Table 5 show how time allowance of three stations associated to an AP is different based on their traffic source rate. Note that, for all three stations a target delay violation probability of 1% was met.

We also include curves showing the response time of the PID algorithm in Fig. 6. We simulate a scenario consisting of three stations where each station starts its traffic flow 20 s after the other. The set of simulation parameters are the same as in Table 2. These results are to show how the new station and others react to the addition of one more traffic flow and how long it takes for the system to converge. Note that due to channel randomness there will always be some perturbations in the results. Figure 6a shows that time allowance converges after roughly 2–4 s. More importantly, the introduction of an additional traffic flow has no noticeable effect on the others. In addition, Fig. 6b shows how average queue size varies with time, clearly suggesting that convergence to a certain trend is achieved almost instantly.

Fig. 6
figure 6

Response of the PID algorithm to three applications starting at 1st, 21st and 41st s of the simulation time. a Time allowance. b Average queue size

In order to verify our PID approach, we compare it with our queuing model analytical results obtained in Sect. 3.3. Table 6 compares the amount of TXOP allocated to a single station according to (14) with that obtained from our proposed PID control algorithm in Sect. 4. The comparison suggests that the time allowances obtained from the model provide a very close lower bound to those computed by the PID controller with an error of less than 10%. Hence, it can be concluded that our approach is operating very close to optimality.

Table 6 Comparison of TXOP allocation by Eq. (14) and the PID controller (in ms)

Finally, we include PSNR performance results for the received video traffic at the client side. Figure 7 compares PSNR for PID, Deadline and EDF schemes. Clearly, our proposed PID aggregation algorithm provides better video quality compared to the other two approaches. Note that a PSNR of 40 dB is the maximum video quality compared to the original uncoded video stream. PSNR may decrease to less than 40 dB if a video packet arrives after its predetermined deadline resulting in loss of data and video quality deterioration. While our proposed PID aggregation scheme can support up to 15 concurrent video streams at the maximum PSNR, EDF can only maintain up to 13 concurrent connections at the same quality level.

Fig. 7
figure 7

Quality of received MPEG-4 video transported on the downlink for different number of stations

Deadline based aggregation; however, performs much worse even for very small number of stations. This is due to the bursty nature of video traffic which temporarily overwhelms the Deadline based aggregation policy to the point that some packets miss their deadline. Some estimation of the traffic arrival process or channel service process is required to prevent this from happening. The reason why EDF does not behave like Deadline is that EDF always tries to maintain a small queue backlog. Such a policy is beneficial when traffic is saturated but will result in sub-optimal channel utilization under moderate load. This can be clearly observed by the slightly larger channel utilization of EDF compared to Deadline shown in Fig. 3a.

5.2 Tests

5.2.1 Implementation and setup

We have made extensive modifications on the Linux implementation of the IEEE802.11 driver without breaking the standard to implement our proposed PID control, as well as EDF and Deadline based aggregation algorithms. We have elegantly separated service policy and aggregation algorithm in the MAC layer from other functions. Therefore, any combination of service policy and aggregation scheme is easy to deploy using our implementation. Figure 8 depicts the software architecture of the wireless card as divided into three distinct layers: the mac80211, which is the first point of entry for packets to be sent, the TX scheduler, which is the main part of the code responsible for sending a packet, and the driver, which is mostly implemented in firmware and it is left intact by our modifications. In particular, we have realized service policy by including several call backs to it in the TX scheduler. The aggregation algorithm is realized by having the Transmission module within TX Scheduler to form aggregated MPDUs, which are then handed over to the software queue for transmission by the driver.

Fig. 8
figure 8

Implementation architecture of the proposed aggregation and service policy scheme

Our experiment consists of a single BSS containing one access point and up to eight stations, all equipped with IEEE802.11n cards set up to communicate in 3 \(\times\) 3 MIMO configuration. Channel variation is artificially induced by manipulating the rate control algorithm of the AP wireless card, and the same channel realizations have been repeated in different experiments, so that the results can be fairly compared. All experiments are executed an average channel rate of 90 Mbps. Tables 7 and 8 enlist specifications of the hosting computers and WiFi cards for the AP and stations, respectively.

Table 7 Access point specification and configuration
Table 8 Client stations specification and configuration

5.2.2 Experimental results

Our first set of experiments evaluate the performance of our proposed PID aggregation scheme for a CBR traffic of 8.5 Mbps and a delay bound of 2 s with a target delay violation probability of 1%. Figure 9 compares PID, Deadline and EDF schemes with respect to channel utilization, delay violation probability and average delay. Clearly, PID outperforms the other two approaches as we have observed in the simulation experiments. In particular, under Deadline aggregation scheme only six stations can be accommodated while satisfying the 1% delay violation probability, and EDF improves this to seven stations. However, PID can serve up to eight stations and still operate well below the target DVP. Moreover, PID is more efficient as it provides almost 10% air-time savings compared to EDF and Deadline.

Fig. 9
figure 9

Experimental results comparing performance of PID, EDF and deadline based aggregation algorithms: a channel utilization, b delay violation probability and c average delay

Fig. 10
figure 10

Experimental results showing quality of received MPEG-4 video transported on the downlink for different number of stations

The second set of our experiments consider actual video streaming flows to the stations. These video streams correspond to a full HD MPEG4 video with an average rate of 7.8 Mbps and a peak rate of 8.9 Mbps. Figure 10 plots PSNR values for the video received at the STA under different aggregation schemes. Clearly, our proposed PID scheduler maintains superior video quality for up to eight stations while both Deadline and EDF experience significantly degraded PSNR values for eight stations. Unfortunately, however, PID fails to provide sufficient PSNR once the number of stations is increased to nine. However, at this point the AP is operating very close to its saturation point and such a result is expected.

6 Conclusions and future work

We propose a novel frame aggregation algorithm for providing statistical delay guarantee over high data rate WLANs. Our aggregation algorithm requires no knowledge of the channel state, as it uses average queue level information combined with the concept of effective capacity to achieve the statistical delay guarantee. We also propose a novel metric that captures the amount of surplus resource provisioning given a certain statistical delay requirement. This metric is used to devise a PID controller implemented at the AP side to determine the time allowance allocated for each link. Simulations as well as experimental data show that the proposed PID based aggregation algorithm performs better than EDF service policy with maximum aggregation and deadline based aggregation.

Moreover, we develop an analytical approximation to the optimal aggregation size by applying queueing theory to our problem. Using these theoretical results we then verify our PID based approach. In particular, our scheme improves channel utilization and maximum number of stations that can be supported for a given delay bound and delay violation probability. Furthermore, we illustrate how delay bound and its acceptable violation probability affect the time allowance required for each station. Our results suggest that making a delay bound tighter is less resource consuming than making the bound smaller.

As for the future work, we intend to apply the PID aggregation scheme to the latest IEEE 802.11ac standard, which allows higher data rates and larger aggregate frame size. In addition we plan to use our novel resource surplus metric to devise a video rate adaptation algorithm, which will also provide statistical QoS guarantee.