1 Introduction

High-speed broadband networks and improvements in display technology of various devices (e.g., smart phone, table PC, and personal media player) have enabled video streaming to become the most popular application over the Internet. Most commercial video streaming services run over HTTP to provide high quality video. The majority of these services use rate adaptive algorithms that adapt the video quality, by observing the available throughput or playback buffer occupancy.

HTTP-based video streaming solutions provide multiple representations (e.g., different bitrate/quality) of the same content, and divide these representations into small segments. The content is stored at the server, and the rate adaptive algorithm at the client decides which segment to download next. The algorithms try to maximize the quality of the video, by meeting conflicting objectives in such a way as to improve the user’s viewing experience. Some of the potential objectives include selecting a set of video bitrates that are the highest feasible, avoiding needless video bitrate switches, and preserving the buffer level to avoid interruption of playback [1,2,3,4,5]. Simply maximizing the video rate risks rebuffering, whereas simply minimizing rebuffering leads to low video quality.

The estimation of throughput plays an important role in the selection of the next segment. Several methods have been proposed to estimate the throughput of the upcoming segment [6,7,8]. HTTP clients make an estimate of the future throughput from past observations to select the video rate for the next segment [9,10,11]. Accurate estimation of the throughput becomes an important challenge for the client. Inaccurate estimation may lead to selecting video bitrates that degrade user experience. To select video bitrates, the rate adaptive algorithms add playback buffer occupancy as an adjustment parameter on top of throughput estimation [7, 12,13,14].

In this paper, we first propose a throughput estimation algorithm with detection and estimation method. The proposed throughput detection method can distinguish between different types of fluctuations in the throughput. Based on the result of throughput detection, the estimation method offers a stable response to short term fluctuations, and is sensitive to persistent large fluctuations. We then propose a rate adaptation algorithm that dynamically selects the video bitrates based on the estimated throughput and the playback buffer occupancy. The objective of the proposed algorithm is to improve the viewing experience of the users. The algorithm streams high quality video, while avoiding playback interruption. The proposed adaptation algorithm minimizes the video rate changes, and makes sure that the video rate changes smoothly to improve the user experience. We perform experiments to show that irrespective of the buffer size and segment duration, the proposed algorithm improves user experience. Additionally, in a multi-client environment, we show that the proposed scheme efficiently utilizes network resources and that the HTTP clients achieve equitable video rates.

The rest of this paper is organized as follows: Sect. 2 offers an overview of HTTP adaptive streaming, and reviews the existing video streaming algorithms. Section 3 presents the proposed throughput estimation algorithm. Section 4 explains our throughput and buffer-based rate adaptive algorithm. Section 5 provides simulation results. Finally, Sect. 6 concludes the paper.

2 Overview of HTTP streaming

2.1 HTTP adaptive streaming

HTTP adaptive streaming works by monitoring network in real time, and by adjusting the quality of the video stream accordingly, without resetting the TCP connection. Figure 1 shows the basic model of adaptive HTTP streaming, which requires the server to store multiple versions of the multimedia content. At the server side, the content annotation module provides information about the characteristics of the stored multimedia content. The client initiates a request for information about the stored content, which is known as metadata. In response to the request from the client, the server sends the metadata to the client. The media preparation module provides tools for encoding and encapsulation, so that the content can be presented and efficiently delivered to the client in the correct format. On the client side, the scheduling module is responsible for scheduling the download of upcoming segments. During the download of the segments, the bandwidth estimation module estimates the throughput. The adaptation module selects a suitable bitrate depending on the received metadata and system conditions, such as the throughput and occupancy of the playback buffer. Once a segment is downloaded, it is temporarily stored in the playback buffer that feeds the player’s decoder.

Fig. 1
figure 1

HTTP streaming architecture

2.2 Related work

Currently, several methods have been proposed to estimate the throughput. Segment throughput is calculated as the ratio of segment size divided by the time it takes to download the segment. In the simplest way, measured segment throughput can be used as the throughput estimate of the next segment [10]. However, due to short term fluctuations, the throughput estimate calculated in this way will result in high frequency of fluctuations. Akhshabi et al. [9] evaluate the performance of Microsoft Smooth Streaming and Netflix player using the running average of the throughput of several segments as the estimated throughput. The method performs well under persistent throughput variations. Ran et al. [6] use the median of the throughput of the last several segments to estimate the throughput of the next segment. Rahman et al. [7] show that the McGinely dynamic indicator offers a stable response to the throughput fluctuations, while maintaining a stable playback buffer. The moving average technique [7] is accurate in slow throughput variation, but reacts late to sudden variations in the throughput. The VLC media player [8] uses the averages of all previous throughputs as the estimated throughout to download the next segment. The method responds slowly to the actual throughput variations, which increases the risk of buffer underflow. Jiang et al. [15] use the harmonic mean of the throughput of the last 20 downloaded segments to estimate the throughput of the next segment. The outliers of the throughput do not influence the estimated throughput, but the method shows delay during persistent throughput variations. Many commercial clients estimate throughput of the next segment by taking the moving average of the previously downloaded segments [4]. The method is late to respond to large variations in the throughput. The throughput estimation methods proposed so far either have an aggressive or a stable response to the throughput fluctuations. As the rate adaptive algorithms select video rates based on the estimated throughput, an aggressive response results in higher number of throughput fluctuation. On the other hand, the stable response increases the risk of buffer underflow and inefficient utilization of the bandwidth. The proposed estimation method can differentiate between small and large fluctuations in the throughput based on variations in the frequency and amplitude of the network throughput. Based on the behavior of the network throughput, the estimation method decides whether to offer an aggressive or stable response.

References [9,10,11] propose rate adaptation algorithms that select the video rates based on the estimated throughput. These algorithms have been found to be either slow to converge to optimum solution, resulting in high frequency of video bitrate switching, or to result in a higher number of playback interruptions. In an unstable environment, inaccurate throughput estimation results in the degradation of the user experience.

Many methods have been proposed to incorporate information about the playback buffer in selecting the video rate. Miller et al. [12] propose a method that divides the buffer into multiple predefined thresholds (B1, B2, B3, Bmax) where (B1 < B2 < B3 < Bmax), and takes different decisions to select the video rates when the buffer level remains in different ranges. The algorithm does not dynamically adjust the buffer threshold as the segment duration and segment sizes of a VBR encoded video stream vary. Rahman et al. [7] propose an algorithm that intelligently selects the video bitrates based on the estimated throughput and buffer occupancy, by dynamically selecting buffer thresholds based on the sizes of the upcoming segments. The algorithm does not adjust the buffer thresholds as the buffer size and segment duration varies. Huang et al. [16] propose a video rate adaption algorithm that selects the video rate by only observing the client’s playback buffer. The authors observe that due to the highly variable network dynamics, especially in the case of wireless networks, it is not easy to estimate throughput. Hence, they limit the throughput estimation to the initial stage. To handle the variation of segment sizes, the method directly maps the buffer occupancy to the segment size, and increases or decreases the video rate as the buffer builds up or drains, respectively. Furthermore, in deciding to switch the video quality, the algorithm considered the sizes of the upcoming segments. Rahman et al. [17] propose an algorithm that selects the video rates only based on the buffer occupancy by exploiting the variation of sizes of the upcoming segments. The algorithm maps the buffer occupancy to the video rate rather than the segment size, as mapping of the buffer level to the segment size results in a higher frequency of switches. Authors in [18] propose a user-centric streaming algorithm for H.264/SVC DASH streaming which adapts its quality according to the playback buffer level only. Dubin et al. [19] provides a rate adaptive algorithm that uses a double Exponential Moving Average (EMA) algorithm. The video quality is selected based on both playback buffer level estimation and throughput estimation. It is designed for multicast networks but the authors showed that it provides a stable performance under both multicast and unicast conditions. Authors in [20] propose the algorithm that adapts the video quality based on crowdsourcing data generated by users of a professional service. In addition, the authors integrate crowd information with the existing algorithms and show that read-world data can improve the performance of existing algorithms.

The proposed algorithm selects the video rate based on throughput estimation and playback buffer occupancy. The algorithm uses two video rate maps to select the video rate of the upcoming segments. To increase the video rate, the algorithm uses a rate map based on a concave function to aggressively increase the video rate in order to efficiently utilize the available throughput, and uses a linear function to conservatively decrease the video rate to avoid playback buffer interruption. The algorithms proposed so far do not dynamically adjust the video rate maps as the playback buffer sizes, segment durations and available set of video rates vary. The proposed algorithm dynamically adjusts the video rate maps as the buffer size of the client, segment duration and available video rates of the video stream vary.

3 The throughput estimation method

The rate adaptive algorithms strive to maximize the user experience by meeting conflicting video quality objectives. Some of the potential objectives include selecting the highest feasible set of video bitrates, avoiding needless video bitrate switches, and avoiding interruption of playback. The rate adaptive algorithms select the next segment on the basis of the estimated throughput. Therefore, it is important for the throughput estimation method to have a stable response to small variations in the throughput, to minimize unnecessary fluctuations in the video rate, and to react quickly to large fluctuations to minimize the risk of playback interruption due to buffer underflow.

3.1 Throughput detection method

The throughput detection method should be able to distinguish between different types of network conditions. To differentiate between fluctuations of different amplitude and frequency in the throughput, we calculate log return. The log return shows the extent of variability of the throughput in relation to the average throughput. Let T(i) and \(\overline {T} (i)\) denote the instant and average throughput, respectively, observed at the download of segment i. We calculate the log return ρ using:

$$\rho =\log \frac{{\left| {T(i) - \overline {T} (i)} \right|}}{{\overline {T} (i)}},$$
(1)

where

$$\overline {T} (i)=\frac{{\sum\nolimits_{{j=i - n - 1}}^{{i - 1}} {T(j)} }}{n}.$$
(2)

A high value of log return means that the difference between T(i) and \(\overline {T} (i)\) is significantly high, due to large fluctuations in the throughput. The client must react quickly to the large fluctuations in throughput. A smaller value of log return means a small fluctuation in the throughput or a short-term fluctuation. The client should offer a smooth and stable response to small fluctuations in throughput.

3.2 Throughput estimation method

After the throughput detection method detects the type of network condition, the client estimates the throughput. We estimate the throughput in (3) using the weighted average of the throughput observed over the last n segments.

$${T^E}(i)=\sum\limits_{{j=i - n - 1}}^{{i - 1}} {\sum\limits_{{k=1}}^{n} {p(k)} \; \times \;T(j)} .$$
(3)

The weighted factor p in (4) depends on the type of throughput fluctuation. If the throughput has large persistent variations, the throughput in recent times has higher weight, which makes the estimation quickly adjust to the actual throughput. We use the exponential function to give higher weight to recent throughput. In the case of small variations, we use the mean of the past throughput observations to provide stable estimation.

$$p=\left\{ \begin{gathered} \frac{{{\alpha ^{{k^2}}}}}{{\sum\limits_{{l=1}}^{n} {{\alpha ^{{l^2}}}} }}\,\,\,\,\,{\text{if}}\,\rho \,\, \geqslant \,\lambda \hfill \\ \frac{1}{n}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{else}} \hfill \\ \end{gathered} \right.,$$
(4)

If the value of ρ ≥ λ, the exponential function is selected to make sure more recent throughput has higher weight; otherwise, the method uses mean of the past observations as the weighted factor. We perform experiments to select the value of λ to detect the type of throughput fluctuations. To this end we use rectangular waveform, shown in Fig. 2.

Fig. 2
figure 2

A rectangular waveform throughput trace

Amax and Amin denote the maximum and minimum values of throughput respectively and Δ represents their difference. L represents the duration of the framework and D is the proportion of time when the throughput is Amax. We observe the response of the throughput estimation method by varying value of λ as shown in Figs. 3 and 4.

Fig. 3
figure 3

Performance of the proposed scheme as the amplitude of fluctuations changes

Fig. 4
figure 4

Performance of the proposed scheme as the frequency of fluctuations changes

In the first experiment, Δ is varied while keeping Amax equal to 3000 kbps and varying Amin. We vary the values of Δ from 250 to 1500 kbps. We want to observe how the proposed estimation method reacts to small and large throughput variations. The objective is to select the value of λ that reacts quickly to the throughput variations to efficiently utilize the bandwidth and minimize the risk of buffer underflow. We vary the value of λ from 0.05 to 0.125. The value of L and D are kept to 60 s and 0.5 respectively. Figure 3 shows that for Δ equal to 250 kbps, setting value of λ equal to 0.05 results in aggressive response to the fluctuations in the throughput. As the value of λ increases, the response to the throughput fluctuation becomes conservative. As the value of Δ increases, the throughput estimation method should be able to react quickly to the fluctuations in the throughput to efficiently utilize the throughput or avoid drop in buffer occupancy due to the throughput overestimation. Figure 3 shows that for large values of Δ, the proposed method reacts quickly to the changes in throughput for all value of λ.

In the next experiment, L is varied while keeping Amax equal to 3000 kbps. The objective is to select the value of λ that stabilizes short-term fluctuations to minimize the unnecessary video rate changes. The value of Δ and D are kept to 450 kbps and 0.5 respectively. In Fig. 4, we vary the frequency of throughput variation. We vary the value of L to observe how the proposed algorithm reacts to long and short-term fluctuations. Figure 3 shows the response of the proposed estimation method as for the value of L equal to 60 s. Figure 4 shows that as the value of L is reduced to 16 s, larger value of λ results in a stable response where as a smaller values of λ show an aggressive response. As the value of λ is increased above 0.125, we observed that the proposed estimation method becomes slow to react to large fluctuations in throughput; therefore, we plotted the results only for the value of λ up to 0.125. The reason behind keeping a difference of 0.05 between the plotted values of λ is that the smaller values resulted in overlapping of the curves which makes the plot reading difficult.

The rate adaptation algorithms select the video rates on the basis of the estimated throughput and the playback buffer occupancy. As explained earlier, the major factors that affect the user experience include the average video rate, playback interruption and frequency of video rate changes. The purpose of the proposed throughput estimation method is to assist the rate adaptive algorithm proposed in the next section to select the video rates. Based on the experiments results shown in Figs. 3 and 4, we select λ equal to 0.1 for the proposed scheme. We observe that when λ is selected equal to 0.1, the proposed method reacts quickly to large fluctuations in the throughput; whereas, in case of small or short-term fluctuations in the throughput, the proposed method reacts conservatively to stabilize the estimated throughput. When the value of ρ is less than 0.1, the proposed technique uses the mean of throughputs observed over the previous n segments. As the value of ρ increases above 0.1, the proposed scheme selects the exponential function to react quickly to the variations in throughput.

To evaluate the performance of the proposed method, we implement the throughput estimation method in the network simulator, ns-3. The server offers discrete bitrates from 400 to 3000 kbps, with a step size of 200 kbps. The duration of each segment is 2 s, and the client starts playback after a segment has completely downloaded. Many commercial clients [21] use the previous 10 samples to estimate the throughput; therefore, we set the value of n equal to 10 for the proposed method. We set the value of α equal to 0.9 to react quickly to the large fluctuations in throughput. The video bitrate is determined by selecting the highest video rate that is less than the estimated throughput.

Figure 5 uses the proposed throughput estimation method to estimate the throughput. The proposed scheme shows a smooth response to small fluctuations; it is able to detect small variations, and offers a stable response to small fluctuations. Furthermore, the proposed scheme reacts quickly to large drop in the throughput. The rate adaptive algorithms select the video rates based on the estimated throughput; therefore, a late response to a large drop increases the risk of interruption in the playback. The proposed scheme is able to detect a large drop, and in order to quickly react to the drop in throughput, estimates the throughput exponentially. As the proposed scheme shows a smooth response to small fluctuations, it reduces the video bitrate switches. Figure 5 shows that the client does not change the video rate during small throughput fluctuations. As the throughput suddenly drops, the video rate drops quickly to avoid buffer underflow. As the client selects the highest video rate that is less than the estimated throughput, the buffer level increases gradually. When the throughput suddenly drops, the buffer initially drops, but as the client quickly drops the video rate, the buffer level stabilizes.

Fig. 5
figure 5

Throughput estimation under a predetermined network scenario

To further evaluate the performance of the proposed scheme, we compare with the method that estimates throughput by dividing the download size by the download time and passing it through moving average filter [21].

Figure 6 compares the performance of the proposed scheme with the moving average throughput estimation method. Figure 6a shows that unlike proposed scheme, moving average method reacts slowly to the actual throughput. This not only results in underutilization of the available throughput when the throughput increases but also risks playback interruption due to buffer underflow when the throughput suddenly drops. In case of a gradual drop and increase in the throughput, the proposed scheme accurately estimates the throughput. The weighted average method reacts slowly to the actual throughput which results in inefficient utilization of the throughput.

Fig. 6
figure 6

Performance comparison between proposed and moving average throughput estimation method

4 Proposed algorithm

In this section, we propose a segment and buffer aware (SABA) rate adaptive algorithm. The proposed rate adaptive algorithm selects the video rate based on the estimated throughput and the playback buffer level. Video rates and rebuffering events are important factors for improving the user experience. In addition, frequent video rate switching has been found to annoy the viewer. The main goal of the proposed algorithm is to adaptively select a video rate from a set of video rates R = {R1, R2, R3,…, Rn}, to optimize the viewing experience.

4.1 System model

The video stream is segmented into n segments, each containing τ seconds of playback, and stored at the server side. Each segment is available in multiple bitrates. The set of representations available for the video stream is denoted by R. The client dynamically selects a video rate from the set R. The client selects the kth video rate, Rk, from the set R for each segment, to adapt the video according to the estimated throughput and playback buffer. Rmin and Rmax are the representations with the lowest and highest video rate in the set R.

The presented work uses a serial segment fetching method to download segments, which requests the next segment after the current segment has completely downloaded. Once the current segment has completely downloaded, it adds data of τ seconds to the buffer. After the first segment has downloaded, the client starts playing the video.

4.2 Adaptive bitrate algorithm

Algorithm 1 provides the algorithm’s pseudo-code. We invoke Algorithm 1 immediately after segment i-1 is downloaded. The algorithm selects the representation for the download of the next segment i. The algorithm considers the buffer level and throughput together.

To download the first segment, the client always selects the minimum available video rate, Rmin. There are two reasons for selecting Rmin as the video rate of the first segment. First, as the buffer builds up from being empty, it carries little information with which to select a video rate. We consider a conservative approach at the start, and as the buffer gradually increases, we start taking more risk in selecting the video rate. Secondly, downloading the segment with the smallest video rate reduces the initial delay. Waiting time impairment such as initial delay is of considerable interest in HAS systems [22].

To select the kth video rate, two conditions should be satisfied. Firstly, the selected video rate should be less than the estimated throughput. To avoid depletion of the buffer, the first condition makes sure that the selected video rate is below the estimated throughput. The throughput is estimated using the estimation method described in Sect. 3. Secondly, for a client to select the kth video rate, the buffer level should be higher than the threshold, Bk. The reason behind this condition is to reduce the risk of playback interruption due to buffer underflow in case the throughput is estimated inaccurately.

The video rate for the next segment is selected on a segment-by-segment basis. We consider the buffer dynamics when the segment has completely downloaded. Let B(i−1) be the buffer level at the end of the download of segment i−1. B(i) is then given by:

$$B(i)=B(i - 1)+\tau - \left[ {\tau \times \frac{{{R_k}(i)}}{{T(i)}}} \right],$$
(5)

where Rk(i) is the kth video rate from the set R, and T(i) is the video throughput observed during the download of segment i. Equation (5) shows that a video rate greater than the throughput drains the playback buffer.

figure a

We use mathematical buffer models to select the thresholds, Bk, used to select the video rates. The buffer models are based on a mathematical function that restricts the video bitrates based on the buffer occupancy level. We use two buffer models to select the video rates based on the buffer occupancy levels. Figure 7 shows that we use buffer models based on concave and linear functions to increase and decrease the video rates as the buffer occupancy increases and decreases, respectively. When the buffer level increases, we use the concave buffer model to restrict the video rates; and when the buffer level decreases, we switch to the linear buffer model. In Fig. 7, Bck and Blk are the buffer occupancy thresholds to select the kth video rates when the client uses the concave and linear functions, respectively. When the buffer occupancy increases, the client uses the threshold Bck to select the video rate; and when the concave wave suggests a lower video rate, the client switches to the threshold Blk to pick the video rate. It selects the threshold, Bk, to select the kth video rate as:

$${B_{\text{k}}}=\left\{ \begin{gathered} B{{\text{c}}_{\text{k}}}\,\,\,\,\,\,\,{\text{when}}\,{\text{the}}\,{\text{client}}\,{\text{decides}}\,{\text{to}}\,{\text{increase}} \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{the}}\,{\text{video}}\,{\text{rate}} \hfill \\ B{{\text{l}}_{\text{k}}}\,\,\,\,\,\,\,{\text{when}}\,{\text{the}}\,{\text{client}}\,{\text{decides}}\,{\text{to}}\,{\text{decrease}}\, \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{the}}\,{\text{video}}\,{\text{rate.}} \hfill \\ \end{gathered} \right.$$
(6)
Fig. 7
figure 7

The video rate map based on concave and linear buffer models

The buffer model based on the concave function to calculate the video rate restriction \(\gamma\) for a segment si [23], is given by:

$$\gamma ({s_i})=\alpha \times {\log _b}({\delta _i} \times c),$$
(7)

where i ϵ [1,N] represents the segment number, δi the buffer occupancy level, and α, b, c are the predefined parameters to fine-tune the model. The buffer occupancy level ranges from 0 to 1; 0 means that the buffer is empty and 1 means that the buffer is full. The parameters can be fine-tuned to manage the aggressiveness of the buffer model. The reason we use a concave function is that the client can more quickly switch to higher quality levels at the beginning, or recover from low buffer occupancy. Now we describe the semantics of the parameters α, b and c. The parameter α is set to the maximum available video rate, Rmax. The parameter c enables a minimum buffer fill state, referred to as steady phase. This means that the client will download the segments with the lowest available video rate, until the buffer occupancy reaches the threshold Bmin. Once the buffer occupancy increases above Bmin, the algorithm enters the adaptation phase. The parameter c influences and modifies the range of the steady state. To set the value of Bmin equal to 20% of the buffer size (δ = 0.2), the value of c is set to 1/δ = 5. This means that until δ ≤ 0.2, the client stays in steady state, because for c = 5 and δ ≤ 0.2, the value of \(\gamma\) ≤ 0. Setting the base of the logarithm function b equal to 5 would result in δ = 1. To aggressively select the video rates, the video rate map selects the maximum available rate, Rmax, when the buffer occupancy is 80% (δ = 0.8) of the buffer size. Therefore, we use the base of 4 i.e. the parameter b = 4, which results in \(\gamma\) = Rmax (or a), when δ = 0.5 and c = 5. As the value of δ increases, the client stays at the current video rate, so long as the value of \(\gamma\) suggested by (7) does not pass the value of the next highest available video rate.

When the concave function suggests a lower video rate, the client shifts to the linear function to select the video rates for the upcoming segments. The buffer model based on the linear function to calculate the video rate restriction \(\gamma\) is given by:

$$\lambda ({s_i})=\left\{ \begin{gathered} {R_{\hbox{min} }}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,B(i - 1)\,<\,{B_{\hbox{min} }} \hfill \\ {R_{\hbox{min} }}+\frac{{B(i - 1) - {B_{\hbox{min} }}}}{{0.8 \times {B_{\hbox{max} }}}}\, \times ({R_{\hbox{max} }} - {R_{\hbox{min} }})\,\,\,\,\,\,\,\,{B_{\hbox{min} }} \leqslant B(i - 1)<0.8 \times {B_{\hbox{max} }} \hfill \\ {R_{\hbox{max} \,}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,B(i - 1) \geqslant 0.8 \times {B_{\hbox{max} }} \hfill \\ \end{gathered} \right..$$
(8)

As the buffer level drops, the client decides to switch to the lower video rates, since the depletion of the buffer indicates that the selected video rate is higher than the available throughput. One of the important objectives of the rate adaptation algorithms is to select the highest feasible video rate, but not at the expense of buffer underflow. The video clients do not have control over TCP sockets, and HTTP/1.1 does not support the termination of ongoing segment transfer, so the client can only switch to a different video rate when the segment download finishes. If the throughput suddenly drops in the middle of a segment transfer, the buffer may run dry before the client switches to a lower video rate. Authors in [24] suggest that the user experience improves when the video rate is increased aggressively as it makes the users believes that the provider is attempting to maximize the QoE. Figure 7 shows that the buffer threshold to select R3 as suggested by concave function Bc3 is greater than the buffer threshold suggested by Bl3. This means that if the client adopts concave behaviour, it can more aggressively select the video rate, in comparison to if it adopts linear behaviour. When the estimated throughput and the buffer level drops, the client decides to switch to the less aggressive linear function, to avoid the risk of buffer underflow. Similar to the video rate map based on the concave function, the linear rate map always selects Rmin when the buffer level drops below Bmin, and selects Rmax when δ ≥ 0.8. As the buffer level decreases, the client stays at the current video rate so long as the value of \(\gamma\) suggested by (8) does not drop below the value of the next lowest available video rate. When the buffer level drops below Bmin, Rmin is always selected.

First, we consider the scenario of an increase in throughput, and a subsequent increase in the buffer level. To increase the video rate in response to the increase in throughput and buffer level, four conditions should be satisfied:

  1. 1)

    R↑< TE(i).

  2. 2)

    The buffer level should be greater than Bc↑.

  3. 3)

    R(i − 1) ≠ Rmax.

  4. 4)

    TE(i) > TE(i−1).

We denote video rates higher and lower than the current video rate by R↑ and R↓, respectively. To avoid buffer drop, the first condition makes sure that the selected video rate is less than the estimated throughput. As the video rate cannot be adapted until the download of the next segment, in the case of a sudden drop in throughput, the second condition reduces the probability of a buffer underflow event. As explained earlier, when the client decides to increase the video rate, it selects Bk using the concave function. The last condition reduces the frequency of video rate switches, by not switching up the video rate when the client observes a drop in throughput. This avoids the risk of a likely step down in the near future.

Next, we consider the scenario of a decrease in throughput, and a subsequent decrease in the buffer level. We stay at the current video rate, until the buffer level drops below Bck. This is to minimize the frequency of video rate switches, by not reacting to short-term fluctuations. Once the buffer level falls below Bck, we shift to the linear function to select the video rates. We continue to reduce the video rate, until the selected video rate is less than the estimated throughput and the buffer level is above the threshold Blk.

4.3 Smoothing video rate switches

The proposed algorithm selects the video rate based on both buffer occupancy and the estimated throughput. The variations in the playback buffer level results in video rate switches. Figure 8 shows examples of video rate switches due to variations in buffer level. In scenario 1, as the buffer level drops below the threshold Bck, the client steps down the video rate from video rate. In scenario 2, as the buffer level increases above Bck, the client steps up the video rate. A small fluctuation in throughput may result in fluctuation of the buffer level around Bck, which means a high frequency of video rate switches.

Fig. 8
figure 8

An example of video rate switches due to variations in the buffer level

Many video streaming services encode their videos in variable bitrate (VBR). Encoding static scenes with fewer bits and active scene with more bits allows more flexibility and efficient utilization of bits. While all the segments are of equal duration, τ, the size of each segment varies. The larger segments will take more time to download, compared with the smaller segments. Therefore even in a stable network environment, as the client downloads segments of variable sizes, the buffer level may fluctuate. This results in higher video rate switches, which impair the viewing experience [2, 4].

To this end, we add a buffer zone around Bck, within which, if the buffer level stays, the client avoids switching the video rate. To explain how the proposed method smooths out the video rate switches, Fig. 9 introduces three scenarios. In scenario (a), the buffer level before downloading segment i lies between \(\overline {B}\) and Bck. If the buffer level increases above Bck, the client switches up the video rate. In scenarios (b) and (c), the buffer level before downloading segment i is higher than Bck. If after downloading the ith segment, the buffer level stays between Bck and \(\overline {B}\), the client does not switch the video rate; whereas, if the buffer level drops below \(\overline {B}\), the client switches down the video rate. The buffer zone should be large enough to absorb the variations in buffer level, but not at the expense of risking buffer underflow. The larger the segment duration, the larger the expected variation in buffer level. Therefore, we set the value of \(\overline {B}\)to:

$$\overline {B}=B{{\text{c}}_{\text{k}}} - \tau ,$$
(9)

where τ is the segment duration.

Fig. 9
figure 9

Selection of video rates as the playback buffer level fluctuates

5 Performance evaluation

5.1 Evaluation setup

We use network simulator, ns-3, as the experimental simulation environment. Our simulation adopts three rate adaptive algorithms as benchmarks. Besides the rate adaptive algorithm proposed in our previous work [7], we adopt the algorithms proposed in [12, 13, 19], as benchmarks to demonstrate the efficiency of the proposal. In the results, we refer to the algorithms proposed in [7, 12, 13, 19] as BBAB, AAA, SARA and MAL respectively. In the simulation, we evaluate the algorithm under varying network conditions, buffer sizes, and segment durations. We modified the code available in https://github.com/djvergad/dash to perform our experiments.

The length of the video is 400 s. To achieve adaptive streaming, the HTTP server offers the client seven levels of representation to adapt the video rates. These video rates are 356, 500, 800, 1200, 1800, 2500 and 3000 kbit/s. Figure 10 shows the topology implemented in this paper. The topology consists of an HTTP server, HTTP client, and a pair of routers. The link between the routers is our bottleneck link. To vary the throughput across the bottleneck, we add the UDP traffic between the routers.

Fig. 10
figure 10

The network topology

5.2 Bitrate adaptation performance

First, we demonstrate how the proposed algorithm performs under multiple environments. For these experiments we set buffer size to 60 s. Figure 11 demonstrates the video rate selected by the proposed algorithm under a small throughput fluctuation scenario. This figure plots the values from the middle of the streaming session, as the objective is to show the response to short term fluctuations. Figure 11 shows that the proposed algorithm is stable in response to small fluctuations, which reduces the frequency of video rate switches. The reason for the stable response is that the buffer distance between Bk(i) and Bk+1(i) and the addition of the buffer zone provides a cushion, and reduces the frequency of video rate switches.

Fig. 11
figure 11

The response of the proposed algorithm to small throughput fluctuations

Figure 12 demonstrates the response of the proposed algorithm to a large throughput drop. An important property of an adaptive algorithm is that it should have a swift response to large fluctuations. To make sure that the throughput drop is not due to a short-term fluctuation, the proposed algorithm waits for the buffer level to drop below . Once the buffer level drops further, the algorithm quickly switches down the video rate, to avoid the risk of playback interruption. The proposed rate adaptive method quickly switches down the video rate, because in the case of a large variation in the throughput, the throughput estimation method proposed in Sect. 3 exponentially varies the throughput.

Fig. 12
figure 12

The response of the proposed algorithm to a large throughput drop

Figure 13a shows that as the throughput gradually increases, the proposed algorithm increases the video rate. As the algorithm adopts the concave behaviour when the buffer level increases, it quickly switches to higher video rates to efficiently utilize the throughput. When the throughput gradually drops, the proposed algorithm maintains a high video rate without risking buffer underflow. Figure 13b shows that the proposed algorithm ensures that a small drop in throughput doesn’t result in unnecessary stepping down of the video rate.

Fig. 13
figure 13

Response of the proposed algorithm to the gradually increasing throughput and decreasing throughput scenarios

5.3 Single-user scenario

The topology of a single-user scenario consists of an HTTP server, an HTTP client, and a pair of routers. We analyze the algorithms for the scenarios mentioned in Table 1. We demonstrate the impact of the buffer size and segment duration on the performance of the algorithms. The HTTP clients offer distinct buffer sizes. The rate adaptive algorithms should be able to guarantee QoE under different client settings. We set the buffer size to 20, 40 and 60 s, and evaluate the performance of the rate adaptive algorithms. Then, we demonstrate how the algorithms perform as the segment duration varies. In the results, we refer to the throughput traces shown in Fig. 14a and b employed for single-user scenario as Scenario A and Scenario B respectively. Scenario A produces bandwidth fluctuations of variable amplitude. We evaluate how the algorithms adapt the video rate as the amplitude of throughput varies. Scenario B initially increases the video rate gradually and then produces large drops in the throughput of gradually increasing durations. We evaluate how the algorithms adjust the video rates when there are small and large variations in the throughput.

Table 1 Buffer sizes and segment durations for the implemented scenarios
Fig. 14
figure 14

Throughput trace employed in the evaluation

5.3.1 Scenario A

First, we set the buffer size and segment duration to 60 and 2 s respectively. Table 2 shows the statistics of the algorithms over the streaming session. The SARA algorithm results in the most fluctuating bitrate curve, and the AAA algorithm is the most stable, but to the detriment of video rate. The proposed method and the BBAB algorithms provide a smoother bitrate curve with higher bitrate. The proposed algorithm is able to achieve a high video rate when the network conditions improve. For downward switching, abrupt switching impairs the QoE, as compared to smooth switching [5, 25,26,27]. On average, the minimum quality level is rated 30% better quality in case of gradual video rate switches compared to an instantaneous switch [25]. The maximum downward switch when the client switches down the video rate means the largest video rate difference between any two consecutive segments over the whole session. Although the average of the switches and the standard deviation (STD) of the video rates selected by the proposed algorithm are higher, the lower value of the maximum downward switch shows that the proposed and AAA algorithms smoothly switch down the video rate. Unlike downward switching, viewers prefer abrupt increase in the video quality for upward switching [28, 29]. The higher value of the maximum upward switch shows that the proposed algorithm aggressively increases the video rate, to better utilize the available throughput. The MAL algorithm achieves high video rate and small number of video rate switches but experiences playback interruption for 1.6 s. The reason behind playback interruption is that the MAL algorithm does not check whether the estimated throughput is lower than the selected representation. Figure 15 shows the percentage of mid to high video rate segments downloaded by the client for Scenario A. As mentioned earlier that the user experience improves when the video rate is increased aggressively [24]. In addition, long spell of good quality video improves the user experience [24, 26]. Figure 15 shows that only the proposed, SARA and MAL algorithms are able to efficiently utilize the bandwidth and download the segments encoded with the highest available video rates. However, the SARA and MAL algorithms stream the high quality video at the expense of high number of video rate switches and playback interruption respectively.

Table 2 Statistics of different adaptive methods when buffer size is 60 s and segment duration is 2 s
Fig. 15
figure 15

The percentage of the time the client downloads the segments encoded at 1800, 2500 and 3000 kbps during the implementation of Scenario A

Table 3 shows the statistics of the algorithms when the buffer size is set to 40 s. The SARA algorithm achieves the highest average video rate, but results in the highest frequency of video rate switches. The SABA, BBAB, and MAL algorithms achieve high average video rate and low frequency of video rate switches. The AAA algorithm stabilizes the video rate curve, but achieves a low video rate. Table 3 shows that both the proposed and AAA algorithms smoothly switch down the video rate to improve the user experience. As explained earlier, the proposed algorithm has a higher average of switches, but this is due to an aggressive increase in the video rate in order to efficiently utilize the throughput. The MAL algorithm experiences playback interruption for 3.7 s. Similar to the previous experiment, the AAA and BBAB algorithms cannot stream the video at the highest available video rate.

Table 3 Statistics of different adaptive methods when buffer size is 40 s and segment duration is 2 s

Table 4 shows the statistics of the algorithms when the buffer size is set to 20 s. Similar to previous scenarios, the SARA algorithm achieves the highest average video rate, and results in the highest number of video rate switches. The BBAB algorithm achieves a video rate similar to the SABA algorithm when the buffer size is set to 60 and 40 s, but in the case of a smaller buffer size, the BBAB algorithm is nearly 400 kbps worse than the proposed method. The reason is that the BBAB algorithm requires large buffer sizes to select higher video rates. The AAA algorithm is the most stable method in the case of larger buffer sizes, but as the buffer size reduces, the frequency of the video rate switches increases. The proposed algorithm keeps the frequency of video rate switches low. The AAA algorithm conservatively increases the video quality, whereas the BBAB algorithm cannot select a video rate higher than 1800 kbps. SARA and SABA algorithms achieve high average video rate; but the SARA algorithm achieves it is at the expense of higher changes in the video rate. The MAL is the only algorithm that experiences playback interruption due to buffer underflow for 2.1 s.

Table 4 Statistics of different adaptive methods when buffer size is 20 s and segment duration is 2 s

Currently, video streaming services deploy segment duration differently in their services. Microsoft Smooth Streaming, Netflix and Apple HTTP Live Streaming offer segment duration of 2, 4 and 9 s respectively [30,31,32]. We set the segment durations to 2, 4 and 10 s, and evaluate the performance of the rate adaptive algorithms. The buffer size is set to 60 s for all of the experiments.

Table 5 shows the statistics of the algorithms results when the segment duration is set to 4 s. In comparison to the experiment where the segment size is set to 2 s, all of the algorithms except the AAA algorithm achieve similar average video rates. The average video rate of the AAA algorithm drops by roughly 200 kbps. The AAA algorithm has the most stable video rate curve followed by the MAL algorithm, whereas the SARA algorithm results in the highest frequency of video rate switches. Table 5 shows that the SABA algorithm has a slightly higher number of video rate switches as compared to the BBAB algorithm, but switches down the video rate more smoothly. In case of the SABA algorithm, the majority of video rate switches are between high and mid-quality. The experiments have shown that limited noticeable difference in quality is observed between high and mid-quality switches during video playback [25]. The AAA algorithm downloads the highest percentage of low quality segments.

Table 5 Statistics of different adaptive methods when buffer size is 60 s and segment duration is 4 s

In the next experiment, we increase the segment duration to 10 s. Table 6 shows that the BBAB algorithm has a stable response, but the average video rate drops. Furthermore, BBAB algorithm does not experience a downward switch. Figure 15 shows that the BBAB algorithm streams more than 70% of the video at 1800 kbps. It means that the algorithm does not efficiently utilize the bandwidth. The SABA algorithm outperforms BBAB by almost 227 kbps. Also, it smoothly changes the video rate to minimize the degradation of the user experience, and has a low average video rate switch. Like the previous experiments, the SARA algorithm achieves a higher average video rate, and experiences higher video rate switches, while MAL algorithm achieves the lowest average video rate. Figure 15 shows that only the proposed and SARA algorithms are able to stream the video at the highest available video rate. The SARA algorithm streams the video at the highest available video rate at the expense of high number of video rate switches and playback interruptions. The SARA and MAL algorithms experience playback interruptions for 20.8 and 0.8 s, respectively. The SABA algorithm downloads a high percentage of high quality segments, and smoothly switches the video rate.

Table 6 Statistics of different adaptive methods when buffer size is 60 s and segment duration is 10 s

5.3.2 Scenario B

In this section we evaluate the algorithms for Scenario B. Table 2 shows that the SARA and MAL algorithms achieve high video rate but at the expense of higher video rate changes and playback interruptions, respectively. The MAL algorithm experiences playback interruptions six times during the streaming session. The proposed algorithm achieves high average video rate. It experiences a slightly higher number of video rate switches because the proposed algorithm aggressively reduces the video rate when the throughput suddenly drops to mitigate the risk of buffer underflow. The BBAB algorithm has a lower average video rate and lower number of video rate switches than the proposed algorithm. Figure 16 shows that similar to Scenario A, only the proposed, SARA and MAL algorithms stream the video at the highest available video rate. The SARA and AAA algorithms download high percentage of video rates encoded at 3000 kbps, but to the detriment of the user experience as they result in high number of video rates switches and playback interruption, respectively. Figure 17 shows that the BBAB algorithm achieves the highest eMOS values followed by the proposed algorithm. The eMOS of the SABA algorithm is 0.04 worse than the BBAB algorithm.

Fig. 16
figure 16

The percentage of the time the client downloads the segments encoded at 1800, 2500 and 3000 kbps during the implementation of Scenario B

Fig. 17
figure 17

eMOS values of the algorithms for scenario B

Then we evaluate the algorithms for the scenario when the buffer size is set to 40 s. The SARA algorithm results in the highest number of video rate of switches, whereas the MAL algorithm results in three playback interruptions, which degrade the user’s experience. The BBAB algorithm has slightly less number of video rate switches because it takes a conservative approach in selecting higher video rates; therefore, it never selects the highest available video rate. The BBAB algorithm is nearly 60 kbps worse than the proposed method. Similar to the previous scenario, the BBAB and AAA algorithms cannot stream the video at the highest available video rate. The AAA algorithm achieves the lowest video rate among the compared algorithms. The SABA algorithm achieves the highest eMOS value followed by the BBAB algorithm.

In the next experiment we set the buffer size to 20 s. Like the previous experiments, SARA and MAL algorithms result in high number of video rate switches and playback interruptions, respectively. The MAL algorithm experiences nine playback interruptions, which degrade the user’s experience. The proposed algorithm achieves a high average video rate. It experiences slightly high number of video rate switches. The reason is that due to small buffer size and the ability of the proposed algorithm to efficiently utilize the bandwidth, when the throughput increases the proposed algorithm quickly increases the video rate and as the throughput drops, it decreases the video rate, while making sure that it does not experience buffer underflow. Figure 16 shows that the proposed algorithm streams the majority of the video at high video rates. The SABA algorithm achieves the highest eMOS value followed by the BBAB algorithm.

Next, we set the buffer size to 60 s and increase the segment duration to 4 s. The proposed algorithm selects high video rate while avoiding the playback interruption. The MAL algorithm experiences three playback interruptions due to buffer underflow. The BBAB algorithm results in the lowest number of video rate changes but it achieves average video rate 150 kbps worse than the proposed algorithm. Figure 17 shows that the proposed algorithm achieves the best eMOS among the compared algorithms.

Next, we increase the segment duration to 10 s. The proposed algorithm achieves the highest average video rate followed by MAL. The MAL algorithm experiences four playback interruptions. The AAA algorithm results in the lowest average video rate, whereas the SARA algorithm experiences the highest number of video rate changes. Similar to Scenario A, in case of BBAB algorithm, the average video rate drops significantly. Figure 16 shows that only the proposed and SARA algorithm is able to select the segments encoded at the highest available video rate. The AAA and BBAB algorithms stream majority of the video at low video rates. Figure 17 shows that the eMOS value of the BBAB algorithm drops sharply as the segment duration is increased to 10 s. The proposed algorithm achieves the highest eMOS value.

The experiments show that the BBAB algorithm keeps the frequency of video rate switches low, but when the buffer size drops to 20 s or the segment duration is increased to 10 s, the average video rate drops significantly. The SARA algorithm achieves a high average video rate, but at a high frequency of video rate switches. On the other hand, the AAA algorithm stabilizes the video rate curve, but achieves the lowest video rate among all the algorithms. The MAL algorithm stabilizes video rate changes but at the expense of playback interruptions, which degrades the user’s experience. The proposed algorithm irrespective of the buffer size and segment duration achieves high video rate and minimizes video rate switches while avoiding playback interruption. Furthermore, assures a smooth switch from the higher rate to the lower video rate.

5.4 Multi-client scenario

In this section, we analyze the performance of the algorithms when multiple clients share the bottleneck. Figure 18 shows the topology implemented for the multi-user scenario. The bandwidth of the bottleneck link is 10 mbps for all experiments. The algorithms are evaluated for varying number of clients, buffer sizes, and segment durations. Similar to the single client-scenario, we set the buffer size to 20, 40 and 60 s and set the segment duration to 2, 4 and 10 s.

Fig. 18
figure 18

Multi-client network topology

In an environment where multiple clients compete for the bottleneck, the clients are inefficient and select low-quality video rates and bandwidth is shared unfairly among the competing clients. The inefficiency at time t is given by the following [15]:

$${\text{Inefficiency}}=\frac{{\left| {\sum\nolimits_{x} {{b_{x,t}} - W} } \right|}}{W},$$
(10)

where W is the bandwidth, and each client x selects bit rate bx,t. [33] defines the unfairness metric for two competing clients as the average of the absolute bit rate differences between the corresponding segments requested by each client. To evaluate unfairness, this is generalized to multiple players as\(\sqrt {1 - JainFair}\), where JainFair is the Jain fairness index [34] of bx,t over all players. Let RA={RA1,RA2,…,RAn} denote the set that contains the video rates achieved by n competing clients. Additionally, to evaluate the unfairness metric for more than two users, we define the parameter diff as follows:

$$diff=Max\left( {{R^A}} \right) - Min\left( {{R^A}} \right).$$
(11)

Ideally, the values of inefficiency, unfairness and diff should be zero. Low values of inefficiency, unfairness and diff are desired; a low value of inefficiency means that the client selects the highest feasible bit rates lower than the actual throughput and low values of diff and unfairness means that the competing clients achieve equitable video rates.

5.4.1 Five-clients scenario

In this section we compare the adaptive algorithms when five clients share the bottleneck. We evaluate the algorithms for scenarios mentioned in Table 1. The results show that the proposed scheme provides the best performance among the compared algorithms overall. The proposed algorithm results in low inefficiency, unfairness and diff values. Figure 19 shows that the SARA and SABA algorithm are the least inefficient whereas the AAA algorithm is the most inefficient algorithm. Figure 20 shows the diff values of the compared algorithms for each scenario. Figure 21 shows the average diff value of the compared algorithms. Figures 19 and 20 show that the proposed algorithm has the lowest unfairness and diff values. Figures 21 and 22 show that the proposed algorithm makes sure that the client switches the video rate gradually while keeping the video rate switches low. The performance of the BBAB algorithm fluctuates from one environment to another. The BBAB algorithm shows improved performance for large buffer sizes and small segment durations, but when the segment duration is increased or buffer size is decreased, the performance of the BBAB algorithm degrades. The SARA algorithm has a low inefficiency value but results in a high unfairness and diff values. Furthermore, it experiences most number of video rate changes. The AAA algorithm has a high inefficiency, unfairness and diff values. Figure 22 shows the video rate switches experienced by the algorithms. The BBAB algorithm has the lowest number of video rate switches. As explained earlier, a large downward video rate switch affects user’s experience more than a gradual downward switch. The proposed algorithm on average experiences 1.25 more video rate switches per 100 s compared to BBAB algorithm but the proposed algorithm results in lower average downward switch. The MAL algorithm performs the worst among the compared algorithms as it experiences a high number of video rate switches. The clients experience on average 10 playback interruptions during the implementation of the MAL algorithm for each scenario. The rest of the algorithms streamed the video without experiencing playback interruption.

Fig. 19
figure 19

Inefficiency and unfairness of the compared algorithms

Fig. 20
figure 20

diff values of the compared algorithms for each scenario mentioned in Table 1

Fig. 21
figure 21

Average diff and downward switch values of the compared algorithms

Fig. 22
figure 22

Video rate switches for each scenario mentioned in Table 1

6 Conclusion

In this paper, we propose a throughput estimation method and a rate adaptive algorithm for HTTP adaptive streaming. The proposed throughput estimation method can accurately estimate the throughput of the upcoming segment based on previous samples. We show that the throughput estimation method is sensitive to large fluctuations, and robust to small fluctuations. The objective of the adaptive bitrate streaming algorithm is to improve the viewing experience of multimedia streaming applications. The proposed rate adaptive algorithm offers stable response during small variations in the throughput to minimize the video rate switches, and quickly reacts when there are large variations in throughput to minimize the risk of buffer underflow. The proposed algorithm provides high quality video by achieving a high video rate while preventing interruption in playback. We show that the algorithm minimizes the video rate switches, and makes sure that the video rate changes smoothly to improve the user experience. Furthermore, we show that in a multi-client scenario, the proposed algorithm efficiently utilizes the bandwidth and the competing clients achieve comparable video rates.