1 Introduction

The use of sophisticated personal wireless devices (laptops, netbooks, smartphones, tablets etc.) capable of the playback of streamed video has risen sharply in recent years. These portable devices employ a number of diverse radio access network (RAN) technologies such as Wi-Fi, 3G and WiMAX, each of which exhibits different network characteristics in terms of bandwidth, delay, jitter, packet loss ratio and other network attributes. Users of these devices increasingly expect to use their portable devices “anywhere, anytime”, including whilst travelling on public transport. However, current mobile network infrastructures often fail to deliver a consistent, high-bandwidth and low-latency network connection to nomadic users, who often run into a scenario where there is insufficient bandwidth on any single network path to ensure delivery of a media (in particular video) stream from server to client.

The emerging wireless networking paradigm known as Mobile Networks [5] addresses the mobility requirements of groups of mobile nodes travelling together in unison. Mobile nodes in a mobile network no longer directly connect to the Internet service provider. Instead, they connect to a local device (Mobile Router or MR) within the mobile network that handles mobility on behalf of all nodes. The mobile routers in multihomed mobile networks [19] offer multiple network access paths employing heterogeneous RAN technologies, which can be accessed and used simultaneously. A user, therefore, has access to all of the network service providers available to the MR. Figure 1 shows the network topology of streaming over a multihomed mobile network in a public transportation context.

Fig. 1
figure 1

Topology of streaming over a multihomed mobile network

It is very challenging to deliver high-quality video streams reliably to nomadic users within multihomed mobile networks. One of the most difficult challenges is that of delivering acceptable quality video over network paths where there is very low available bandwidth, such as those often encountered in current real life mobile networks.

To support video applications for resource-constrained users, H.264 Scalable Video Coding (SVC) [16] has recently been standardised as an extension to the H.264 Advanced Video Coding standard (AVC) [20]. Its scalable nature facilitates the adaptation of video streams in response to varying network conditions and terminal capabilities. In SVC, a stream consists of an essential AVC-compliant base layer, providing a minimum quality of video, and a number of optional enhancement layers, which improve the quality of the received stream. The three dimensional scalability of SVC utilises picture resolution, frame rate and signal-to-noise ratio to provide spatial, temporal and quality enhancement dimensions and can be flexibly exploited for network or terminal adaptation of streams.

In order to safeguard valuable network resources, a sender may only send those layers that a client node is capable of processing. When the network is congested or there is insufficient bandwidth to deliver the entire stream, a Media Aware Network Element (MANE) may drop higher enhancement layers to reduce the bandwidth requirement and ensure the delivery of the base layer and lower enhancement layers which are more important to a user’s perceived video quality. Sacrificing the (entire or partial packets of) upper enhancement layers in this manner, provides the user with an acceptable (albeit lower quality) video and makes efficient use of available bandwidth. Figure 2 demonstrates such a selective packet-dropping scenario, where the higher enhancement layer (EL2) of a stream is dropped whilst the base layer (BL) and the lower enhancement layer (EL1) are delivered to a mobile user.

Fig. 2
figure 2

Scalable video stream adaptation

Streaming algorithms have previously been proposed to exploit the aggregated bandwidth of multiple network paths for the delivery of MPEG2 and MPEG4 encoded video content over wired [7], wireless [2, 3, 6] or mobile [11] networks. Some of these schemes [7, 11] have employed generic packet prioritisation to facilitate selective packet dropping when network conditions do not permit delivery of the entire stream. However, previous schemes have rarely investigated the use of H.264/SVC specific packet prioritisation mechanisms or addressed the issue of video stream delivery in the bandwidth-constrained situations which are commonplace in multihomed mobile networks, even after bandwidth aggregation.

In this work, we empirically investigate mechanisms to address the problem of streaming in resource-constrained multipath environments. Firstly, we extend our previous work in [11] by exploring a Quality of Experience (QoE) aware transmission scheme where an SVC-specific selective packet-dropping mechanism utilises a Quality Layers [1] based packet prioritisation scheme. Secondly, we introduce a base-layer rate control scheme to improve streaming performance at ultra-low bandwidth. These packet prioritisation and rate control schemes are then fully integrated with the bandwidth aggregation and multipath scheduling algorithm proposed in [11] to provide a comprehensive, fully functional streaming evaluation framework.

It is noted that this work substantially extends our previous work [12] and presents the following multifold new contributions. Our study of H.264/SVC streaming in resource-constrained multipath environments provides an extensive set of experimental results from a realistic testbed environment. We give an insight into the trade-off between the benefits and costs of path selection and switching in multipath transmission. We also explore two different priority weighting schemes for H.264/SVC packet prioritisation, highlight the positive improvements that can be achieved by the use of base-layer rate control at ultra-low bandwidths, and also identify the performance gaps for future research. The empirical performance evaluation was conducted in a realistic multihomed mobile networks testbed.

The rest of this paper is organized as follows. Section 2 introduces the concerned packet prioritisation and rate control schemes. Section 3 presents the testbed environment with results of our experiments being presented in Section 4 and the conclusions drawn in Section 5.

2 Packet prioritisation and rate control

The H.264/SVC standard specifies two possible methods of extracting a scalable bitstream, the Basic Extraction method and the Quality Layers method. Figure 3 outlines the two options. Additionally, an alternative packet prioritisation method proposed in [7] is described in this section together with the two SVC-specific ones.

Fig. 3
figure 3

H.264/SVC bitstream extraction options

2.1 Prioritisation based on frame types and differentiation of base and enhancement layers

While still based on I, B and P frames, the priority scheme proposed in [7] by Jurca and Frossard also differentiates between base and enhancement layers for video streams of a generic scalable format (not exactly H.264/SVC). In an approximation, the following weights are chosen as shown in Table 1. The higher the weight, the more important the packet is. However, there is no differentiation in the weighting of enhancement-layer packets, which all carry the lowest priority in the scheme.

Table 1 Prioritisation method by Jurca and Frossard [7]

Nevertheless, the importance of the work in [7] is that network resources are not wasted by sending packets that cannot be decoded either due to delay issues or due to unmet dependencies (i.e. the packet to be decoded depends on an ancestor that had been dropped).

2.2 Prioritisation using the basic extraction method

In order to extract a sub-stream of an SVC bitstream to match a target transmission rate, the Joint Scalable Video Model (JSVM) [14] developed by the Joint Video Team defines a “Basic” Extraction method in which the dependency_id (Did), temporal_id (Tid) and quality_id (Qid) of Network Abstraction Layer (NAL) units are used to provide its priority. NAL units are the building blocks of an H.264 encoded video stream with encoded video data being placed into NAL units by the encoder. Each NAL unit consists of an integer number of bytes where the first byte is an H.264/AVC compliant header (to allow base layer decoding by H.264/AVC decoders), with the next three bytes being the H.264/SVC extension header and the remainder being the payload (encoded video data).

The capabilities of the receiving device normally determine the maximum spatial and temporal resolution of an extracted bitstream. The base quality layer (quality_id = 0) of spatial and temporal layers having lower than or equal to the receiver’s resolutions are added to the bitstream. The NAL units of the lower spatial layer are then added in increasing order of their temporal_id. At the target spatial resolution, NAL units are ordered in increasing order of quality_id. This Basic Extraction method is shown in Fig. 4a and b for base layer and enhancement layers, respectively. NAL units in the base layer have higher priority than those in the enhancement layers.

Fig. 4
figure 4

Priority order for NAL units using the Basic Extraction for (a) base layer and (b) enhancement layers

2.3 Prioritisation based on quality layers

In this work, we also exploit the method of quality layers, which is included in the SVC standard as a means of improving the overall rate-distortion performance of extracted bitstreams. According to this method, a high-level syntax element (simple_priority_id) is placed in the NAL unit extension header. When it is desired to embed quality layer information in a scalable bitstream, this syntax element is assigned a value from the post-processing tool included in the JSVM reference software. A priority weighting is assigned to each NAL unit based on its contribution to the overall picture quality. Quality layer information may optionally be carried in SEI messages rather than the simple_priority_id syntax element. The use of quality layers assigns priority weightings to elements of the entire scalable bitstream.

Quality layer information can be calculated using the technique described in [1, 17] and outlined here. A picture i is encoded with different spatial layers. Within each of these layers, a picture is encoded using a base layer and quality refinement levels q. The rate-distortion values R(d, i, q), D(d, i, q) are calculated for each picture i, where d represents the dependency or spatial layer value. The rate-distortion parameters for all pictures are then utilized to provide the rate-distortion curve of the encoder. Rate-distortion points on the convex hull of the curve are then sorted according to their rate-distortion slope and the quality levels are calculated from this slope. Figure 5 illustrates a typical rate-distortion curve [1].

Fig. 5
figure 5

Example of a rate distortion curve [1] ((c) 2007 IEEE)

The quality layer information is used to extract an optimised bitstream at the encoder while in this work we extend its use for network adaptation of a bitstream. The quality layer based extraction method operates differently for single spatial resolution streams and for multi-resolution streams, respectively. For single resolution streams, a target quality layer is calculated from the target bit rate and all NAL units whose quality layer (simple_priority_id) is higher than the target quality layer are then removed from the bitstream. NAL units with a quality layer equal to the target layer may also be truncated.

For the multi-resolution case, quality refinements of the highest spatial layer are only used if quality refinements of the previous layer are all present in the extracted stream. Quality layer based extraction is illustrated in Fig. 6. For brevity, only enhancement-layer NAL units in the single spatial resolution case are shown in Fig. 6 using two temporal layers (T 0 , T 1 ) and two quality layers (Q 1 , Q 2 ). As Q 0 is a base layer NAL unit it is not shown in Fig. 6, but is shown in the Basic Extraction scheme shown in Fig. 4a.

Fig. 6
figure 6

Priority order for NAL units in the enhancement layers using the quality layer based extraction method (single resolution case)

2.4 Base-layer rate control

Rate control is a mechanism employed in video encoders to provide a quality optimised video stream under the constraint of channel bandwidth. This is achieved by adjusting the quantisation parameters. Rate control regulates delivery of the encoded bit stream to meet a given channel bandwidth constraint, while also optimising the video quality.

The current JSVM reference software [17] uses a model based on the work of Leontaris et al. [8]. Rate control is applied to a basic unit (BU), which can be as small as a macroblock or as large as a frame. The authors of [9] report that a larger number of macroblocks in a basic unit results in an improvement in peak signal-to-noise ratio (PSNR); however, this is achieved at the cost of slightly increased variability in the bit rate of the encoded stream. In our testing, we used a BU size of 99 to maximise the PSNR.

The basic principles of rate control, as applied to SVC, are briefly described here. A starting quantisation parameter (QP) is applied to every group of pictures GOP (except the first GOP, which has a pre-defined initial QP) basic unit and a target bits estimate obtained using the bit allocation mechanism. The starting QP (QP st ), is given by (1) where N p is the total number of P frames in the previous GOP, Sum PQP is the sum of quantization parameters for all P frames in the previous GOP, T r is the total number of remaining bits for all non-coded frames and N gop is the total number of frames in a (GOP). QP st adapts in response to changes in the GOP length or available channel bandwidth. The I frame and the first P frame are coded using QP st .

$$ Q{P_{{st}}} = \frac{{Su{m_{{PQP}}}}}{{{N_p}}} - 1 - \frac{{8{T_r}({n_{{i - 1,{N_{{gop}}}}}})}}{{{T_r}({n_{{i,0}}})}} - \frac{{{N_{{gop}}}}}{{15}} $$
(1)

This initial estimate allocates approximately the same number of bits for each picture to reach the desired target buffer level. There is no rate-distortion (R-D) optimisation at this stage. A linear Mean Absolute Difference model (MAD) is used to predict the MAD of the current BU from that of the BU at the co-located position in the previous frame. This overcomes the chicken and egg problem where the QP is required for both rate control and rate-distortion optimisation of the current BU but cannot be derived at this stage in the process. The target number of texture bits is then estimated from the number of header bits used in previous basic units. Based on the knowledge that the number of target texture bits is correlated with the QP, a target QP for the basic unit is derived using a quadratic model. During the encoding process, the target bit rate and upper and lower bounds of the QP are provided to the encoder. In the context of this work, an SVC base layer can be adapted in a quality layer based distortion-efficient manner to produce base layers with ultra-low bandwidth requirements, which will be easier to manage in the challenging mobile networks environment where bandwidth scarcity is commonplace.

3 Implementation

3.1 Testbed design

In this work, we have implemented and compared video streaming algorithms for the H.264/SVC standard based on the packet prioritisation schemes and rate control discussed in Section 2. We have used our hardware-based multihomed mobile networks testbed which provides a practical Linux user-space environment that can replicate realistic scenarios. The testbed, whose topology, elements and their major functionalities are shown in Fig. 7, consists of standard PCs running Ubuntu Linux for the video streaming server, the mobility-management home agent (HA), the core routers (CR1 and CR2) and the Mobile Router and the Mobile Network Nodes (MNNs). Two paths are provided between the video streaming server and the multihomed mobile network. Each path consists of a 100Mbps Ethernet wired link incorporating a core router running wide-area network emulation and path monitoring modules and an IEEE 802.11g wireless link offered by a modified Linksys WRT54GL wireless router.

Fig. 7
figure 7

Topology of our multihomed mobile networks testbed

All PCs used in the testbed except the streaming server have 3.4 GHz Pentium 4 processors. Mobility agents and core routers have 1 GB of RAM with the client node having 2 GB of RAM. The streaming server has an Intel Core i5-670 3.4 GHz CPU, 4 GB of RAM and a solid-state hard drive. Mobility management is provided by NEMO [5] running at both the home agent and the mobile router.

3.2 Implementation of streaming algorithms based on prioritisation schemes

For each video test sequence used, three packet prioritisation methods for H264/SVC are implemented: the SVC Basic Extraction method, the rate-distortion-optimised Quality-Layers-based method described in this paper and the scheme proposed in [7]. Quality layer information is embedded in the simple_priority_id element of the NAL unit extension header rather than the optional quality layers SEI message. Fig. 8 highlights the streamer-side system processes and Fig. 9 shows the streaming algorithm using Quality Layers.

Fig. 8
figure 8

Overview of the streaming framework components and operation

Fig. 9
figure 9

Streaming algorithm (Quality Layers case)

Firstly a number of pre-processing steps are applied. Within the pre-processing module shown in Fig. 8, we calculate and embed the priority weighting for each NAL unit in the stream as described in lines 107-108 of Fig. 9. It is noted that the Quality Layers case is shown in Fig. 9 and that these steps are different for the other two prioritisation schemes. At this stage a decision is also made on the encoding and extraction bit rates to be used (Fig. 9 lines 102-105) with a scalable stream being encoded and extracted at the chosen rates (Fig. 9 lines 106, 109). The method for obtaining target encoding and extraction bit rates is described in Section 3.3.

On a pre-fetch window basis, we sort the NAL units in order of priority (according to the schemes described in Section 2). The pre-fetch window is the ‘window of knowledge’ of the stream that the scheduling/dropping mechanism holds in its buffer when making path selection and packet scheduling/dropping decisions. It was shown in [7, 11] that a small read-ahead window was sufficient when using multipath aggregation schemes. The experiments conducted for this paper all used a read-ahead window size of one GOP. For every scheme, the base-layer NAL units appear first as they are given the highest level of protection. In [7], base-layer NAL units carry different weighting according to I, B or P frame classification as shown in Table 1. The other two schemes do not differentiate base-layer NALs according to frame type and as such NALs are ordered in the same way as they appear in the original stream. Enhancement layers are then ordered for each scheme as described in section 3.

The packet encapsulation (packetisation) module shown in Fig. 8 accepts NAL units and encapsulates them for transmission either as one NAL unit per RTP packet, or by using a Single Time Aggregation Packet (STAP) [21]. STAP packets contain both the AVC base-layer NAL unit and the Type-14 NAL unit, which provides the base-layer packets within the stream with scalability information.

RTP packets are then passed to the path selection module and packet scheduling modules (Fig. 9 lines 124-148), which firstly determine if a viable path to deliver the packet to the client in time to be of use in the decoding process exists. If there is more than one viable path in the multiple available paths, the best path is chosen using the algorithm we developed in [11], which takes account of all relevant factors such as path conditions of available bandwidth and delay, packet size, network overheads and path-switching cost. Where no viable path can be found, the packet is dropped. It should be noted that any packet containing a NAL unit that relies on a previously dropped NAL unit for decoding would not be scheduled for transmission. Where a packet is to be sent on any path other than the last used path, a path-switching signal is sent to the home agent, where the path-switching module implements the change. The packet is then directed to the appropriate outgoing interface of the home agent. The switching method employed is the one we used in [19].

3.3 Determining target bit rates for base-layer rate control

SVC offers the facility to adapt a video stream to current network conditions, and there are two ways to achieve this. Where there is a known fixed (perhaps quality of service [QoS] guaranteed) bandwidth, the stream can be extracted in a rate-distortion optimised manner by using the known bandwidth as the target bit rate with any temporary reduction in capacity being handled by selectively dropping NAL units. In this scenario, the base-layer rate control would be chosen to reflect the lowest client device capability envisaged.

As multihomed mobile networks provide more complex, dynamically changing network paths, another means is required to determine the most appropriate target bit rates for both base-layer rate controlled encoding and scalable stream extraction. The base-layer rate control target bit rate should reflect the lowest viable, non-zero aggregated bandwidth that is likely to be encountered during the course of the journey undertaken in the mobile network (e.g., public transport vehicle). This will provide a rate-distortion optimised bit stream at very low aggregated bandwidths. However, in the case of the target bit rate for scalable stream extraction, the optimisation problem of selecting the best stream extraction rate, such that no bandwidth will be wasted and that any adaptation of the stream in response to network fluctuations will be R-D optimal, needs to be considered. Towards solving this optimisation problem, in this work we select a target extraction bit rate that matches the highest anticipated aggregated bandwidth available to the mobile network and propose a simple scheme to keep track of historical data that can be used to determine the most appropriate target bit rates for any given mobile network. The main components and operation of the route history sub-system are shown in Fig. 10. In a public transport situation, it would be sufficient to know the historically encountered highest and lowest (non-zero) bandwidths encountered on any specific transport route.

Fig. 10
figure 10

Overview of route history subsystem

We propose the use of simple cooperating agents at the mobile router, the home agent and the streaming server in order to provide data to the encoder. At the MR, an agent reports its current route identity to the HA where this data is stored in a database containing the current mobile network IP address prefix and the current route number. Whenever a client node initiates a session with the streaming server, the agent at the server sends a request to the HA asking for the route number associated with the mobile network prefix of the MR to which the client node’s traffic is being directed. The streaming server is then able to build and maintain a historical database of route numbers and lowest, highest and mean aggregated bandwidths encountered on these routes. This data can then be used to provide the target base-layer rate control and scalable stream extraction bit rates to the encoder. Fig. 10 illustrates such route monitoring and retrieval scheme.

4 Experimentation and results

We conducted two sets of experiments. The first set of experiments studied the performance of packet prioritisation schemes in “higher” aggregated bandwidth situations where the aggregated bandwidth of all available paths ranged from 50 % to 110 % of the total bandwidth requirement of each test sequence. This scenario replicates the public transport situation encountered in large metropolitan areas where a number of service providers will have highly developed network infrastructures. In the second set of experiments we examined the use of base-layer rate control to enhance delivery of SVC video in “lower” bandwidth situations where the available aggregated bandwidth was below 256 Kbps, replicating the conditions often found in current mobile networks operating outside the major conurbations.

The raw video sequences were encoded using the JSVM SVC reference software [17]. We used the Soccer sequence (4CIF/60fps, 4CIF/30fps, QCIF/15fps), the Bus sequence (CIF/30fps, QCIF/15fps) and the Foreman sequence (CIF/30fps, QCIF/15fps). All streams were encoded using a single spatial layer, and multiple temporal and quality layers. Some tests used a fixed bandwidth, end-to-end delay and path ratio for the duration of the test, while in others the available bandwidth, end-to-end delay and path ratios were changed dynamically during a streaming session to replicate a more realistic scenario. Results presented are mean values. Two bandwidth aggregation scenarios are provided: in an equal path situation, the required bandwidth is equally distributed across the two paths; in the high differential path situation, one path is allocated 80 % of the required bandwidth and the other path 20 %. Our choice of bandwidth aggregation scenarios attempts to provide a realistic replication of the path conditions that may be encountered in a real multihomed mobile network. Path monitoring and changing mechanisms used are those that we developed in [11]. For each test run, a streamer trace file is generated containing details of the scheduling decision made on each packet with summary statistics of numbers of packets dropped and network overheads and path-switching costs added. Error concealment in the received stream was performed using the Frame Copy (FC) method.

In addition, the client also produced a time-stamped trace file of all packets received allowing analysis of out-of-sequence or late arrival and the identification of packets that cannot be used due to unmet dependencies arising from packets dropped or lost in the network. We correlated the network path conditions and changes to the scheduling decisions using both streamer and network generated statistics. Wireshark [22] was deployed at all nodes in the network to permit verification of packet statistics generated by the components of our framework.

4.1 Results – higher bandwidths

4.1.1 Video quality performance measurements

This section evaluates the effectiveness of prioritisation schemes with higher aggregated bandwidth. When taken across all test sequences and bandwidth ranges, both the JSVM Basic Extraction method and the Quality Layers method show a significant improvement in PSNR over the prioritisation method used in [7]. The Quality Layers based method provided an improvement in PSNR of up to 1.5 dB (mean improvement = 1.21 dB) over the scheme used in [7] and also outperformed the Basic Extraction method based scheme by up to 0.14 dB. The Basic Extraction scheme showed an average performance gain of 1.115 dB over the scheme from [7].

In Figs. 11-13, the PSNR performance, of a single ‘ideal’ path of available bandwidth equivalent to the aggregated bandwidth of the two paths, a two-path equal split scenario and a two-path 80:20 split are compared using each of the three prioritisation schemes. It can be clearly seen that the single path provides a higher PSNR than either of the aggregated multipath methods. This is due to the path-switching cost in multihomed mobile networks. Path switching operations take place when the current path being used cannot meet the transmission requirement and an alternative path has to be employed. A path switching operation, by introducing delays, reduces the number of packet that can be sent in time to be of use in the decoding process compared to the single path scenario.

Fig. 11
figure 11

PSNR comparisons for the Soccer sequence

Fig. 12
figure 12

Bus PSNR comparisons for the Bus sequence

Fig. 13
figure 13

PSNR comparisons for the Foreman sequence.

Compared with the equal bandwidth scenario an unequal distribution of bandwidth across the paths shows that one path will carry a higher proportion of the stream, path-switching operations are reduced and the resulting PSNR is higher for the received stream. It is noted that there is no path switching cost in single path transmissions. The single path results are provided to highlight the costs associated with multipath streaming in a multihomed mobile networks environment, where the path switching cost is a severe limiting factor.

In the single path scenario we drop packets based on the priority weighting attached using the same scheduling mechanism as for the multipath scheme with the path selection (switching) module disabled, i.e. all packets are either sent on the current path or dropped due to the inability to deliver on time or unmet dependencies. Although there are variations in the PSNR improvement offered by JSVM and Quality Layers methods over the Jurca and Frossard [7] scheme between different sequences, the general improvement is shown to be valid for all of the streaming scenarios investigated. They all show a higher PSNR of typically just over 1 dB for JSVM and Quality Layers methods, with the Quality Layers method performing slightly better.

4.1.2 Packet/NAL unit delivery statistics

One important measure of the success of a streaming system is that it is able to successfully protect and ensure delivery of base-layer NAL units, which are the most important video contents. In our tests, the incidence of base-layer NAL unit loss is very low. Fig. 14 shows that less than 0.2 % of all base-layer NAL units were lost.

Fig. 14
figure 14

Comparison of base-layer NAL unit loss by prioritisation scheme

Table 2 shows the average number of out-of-sequence packets arriving at the client across all tests using all three prioritization methods. The incidence of out-of-sequence delivery was low for all tests with no significant variation between prioritisation schemes, thus individual results for each prioritization method are not shown for brevity. This can be expected as it is primarily determined by the path condition aware scheduling algorithm based on estimated arrival time at the client and the base layer NAL units were given highest priority in all three prioritization methods. The number of packets successfully delivered to the client (packet delivery ratio) for the Soccer sequence (4CIF/60fps) is shown in Fig. 15. The number of packets that are usable in the decoding process is also shown. Although, for brevity, only one test sequence is illustrated, results shown in Fig. 15 were consistent across all test sequences with the Quality Layers scheme performing best, and the JSVM basic scheme achieving comparable performance.

Table 2 Out-of-sequence packet delivery (% of total packets)
Fig. 15
figure 15

Packets arriving at the client for the Soccer sequence

We have previously identified that the path-switching operation is a severe limiting factor in multihomed mobile networks. A path switching ratio can be defined as the percentage of path selection decisions that lead to a path switching operation. Fig. 16 demonstrates the substantial difference in path switch frequency when high differential paths are used. In all schemes, the high differential paths delivered a higher number of packets to the client thanks to fewer path-switching operations.

Fig. 16
figure 16

Path switching ratio for each scheme

4.2 Results – lower bandwidths

In this section we provide an experimental evaluation of the effectiveness of using base-layer rate control in combination with Quality Layers based packet prioritisation in ultra-low bandwidth situations. QCIF (176 × 144) versions of the Bus, Foreman and Soccer test sequences were encoded at 15 fps and 30 fps using the JSVM encoder [17] with MGS (Medium Grain Scalability) of both temporal and quality enhancement layers. The lower spatial and temporal resolutions were used to better match the test sequences to the low bandwidth environment. All of the sequences were also processed using the Quality Layers method for packet prioritisation. Each sequence was firstly encoded without the base-layer rate control activated and then with the base-layer rate control enabled. Table 3 shows the target bit rates used for three sets of test sequence. In Set 1, no rate control (RC) was applied; in Set 2, the base-layer rate control target bit rate was given values ≤ 64 Kbps; in Set 3, the base-layer rate control was allocated values between 64 Kbps and 128 Kbps. Each sequence was firstly streamed at fixed aggregated bandwidths above the base-layer target bit rate and the resultant PSNR and packet statistics were recorded. As with the higher bandwidth testing (Section 4.1), the bandwidth was initially equally distributed over the available paths whilst subsequent tests were performed with high differential distribution across the two available paths (80:20 split).

Table 3 Target bit rates for the encoded sequences

The results for the lower bandwidth testing on unequal (80:20 split) paths are shown in Fig. 17. Unequal paths showed a similar performance gain over equal paths to that achieved in the higher bandwidth testing. As can be seen from Fig. 17, the average PSNR of the uncontrolled scenario (no rate control) is substantially less than that of the rate-controlled scenarios at the lower end of the bandwidth spectrum, while being more similar towards the upper end of the testing range. This observation holds true for all test sequences. The scenario where rate control was applied at the lowest possible target bandwidth (≤ 64Kbps scenario in Fig. 17) provided the highest PSNR across the entire bandwidth spectrum. This demonstrated that, despite the low bandwidth, the vast majority of base layer packets were still delivered. Furthermore the streaming mechanism was also able to deliver a large number of enhancement layer packets in this scenario.

Fig. 17
figure 17

PSNR comparisons for low bandwidth testing

Consider the results of the Bus sequence testing shown in Fig. 17. When no rate control was applied, the base layer was encoded at a default bit rate of 166Kbps. Therefore it was impossible to deliver the entire base layer at less than 166Kbps and a poor PSNR result was obtained for most of the testing range. When rate control was applied in the range between 64Kbps and 128Kbps the PSNR improved as the base-layer (and some enhancement-layer) packets were successfully delivered at lower bandwidths. And, as discussed above, when rate control was applied at or below 64Kbps base layer, many more enhancement-layer packets were delivered compared to the non rate controlled scenario. This naturally led to a substantial PSNR improvement in the rate controlled scenario.

Taken as an average across all test sequences, the base-layer controlled experiments offered an improvement of 1.59 dB over the uncontrolled experiments at an aggregated bandwidth of 64Kbps. This was reduced to 0.19 dB at 128Kbps where the majority of the uncontrolled base layers were delivered successfully. At 256Kbps there was very little difference between the PSNR of any of the test sets of the same encoded sequence.

The largest difference was observed in a comparison of those test sets that had a controlled base-layer target of 64Kbps with those test sets of the same sequence with no rate control applied streamed at 96Kbps. In this case there was an average PSNR improvement of 3.07 dB. This significant difference was due to the fact that at 96 Kbps available bandwidth the controlled stream of 64Kbps bit rate resulted in the delivery of a considerable number of enhancement layer packets which could not be otherwise delivered through the uncontrolled stream.

4.3 Discussion

4.3.1 Impact of prioritisation schemes on dropped NAL units

In our previous work [11], we have established that it is possible to deliver SVC encoded video streams within the challenging multihomed mobile networks environment with path condition aware streaming. In this paper, we empirically investigated the performance of SVC-specific, media-aware prioritisation schemes and the use of base-layer rate control. Our comparison of three packet prioritisation schemes for video streaming in these conditions has established that both the JSVM Basic Extraction scheme and the Quality-Layers-based prioritisation scheme outperform previously proposed non-SVC specific methods represented by Jurca and Frossard’s scheme [7]. The Quality Layers solution also outperforms the JSVM basic scheme in PSNR terms.

The reason for this performance improvement over [7] lies in the fact that all enhancement-layer NAL units are given the same priority in [7], while the other two schemes rank enhancement-layer packets according to their importance in the stream as dictated by either the (Did, Tid, Qid) tuple or the assigned quality layer value. Fig. 18 shows a case study comparison of which NAL units were dropped in a GOP of the Soccer sequence in response to a 10 % reduction in aggregated bandwidth on high differential paths. Due to all enhancement NAL units having the same priority in [7], the NAL units dropped are all concentrated at the end of the GOP. The JSVM Basic Extraction and Quality Layers schemes both distributed the dropped NAL units more evenly across the GOP, removing quality enhancement packets at the highest temporal layer. The difference in dropped packets between JSVM Basic Extraction and Quality Layers methods was small.

Fig. 18
figure 18

Comparison of which NAL units are dropped by each scheme

An analysis of the correlation between which NAL units were dropped in each scheme is provided in Fig. 19. Of the NAL units dropped by the Quality Layers scheme 92 % were also dropped by the JSVM Basic Extraction scheme, while only 50 % were dropped by the Jurca and Frossard scheme. This correlation can also explain the close PSNR performance for the JSVM Basic Extraction and Quality Layers schemes as well as the higher PSNR performance of these two schemes when compared to the work in [7].

Fig. 19
figure 19

Correlation of which NAL units are dropped by each scheme

4.3.2 Impact of prioritisation schemes on the nature of the stream and multipath transmission

Apart from the fact that enhancement-layer packets are dropped in order of their notional priority rather than their position within the stream, we have also identified that when using different prioritisation schemes the nature of the stream changes. This happens in a way that can affect the path-switching frequency, which we have shown substantially affects the PSNR of a received stream. A path change occurs when the current path is unable to deliver the packet on time to be of use in the decoding process.

One significant component of the arrival time estimation is the transmission time, which is determined by packet size and available bandwidth [11]. Changing a stream in a way such that a number of larger packets follow each other, may trigger additional path switches thus introducing additional delays and negating any small improvement in PSNR that can be derived from the use of an alternate priority scheme. The performance improvements of Quality Layers over the JSVM Basic Extraction method observed by us are not as large as those reported by Anomou et al. in [1], which employed single-path streams and operates in less demanding network environments.

We believe that this is due to the way in which packet size can affect path-switching ratios. The three priority schemes used will each result in a different ordering of packet sizes arriving at the path selection module, which will alter the rate at which individual paths “fill up” and consequently the path-switching frequency. Fig. 15 shows that a small percentage of those NAL units delivered to the client could not be decoded. This is primarily due to the loss of a packet containing a base-layer NAL unit during transmission resulting in its dependent NAL units being rendered useless. It is also worth noting that, as with the schemes in [3, 7], path selection and scheduling decisions in our testbed are not simply made based on available bandwidth like [1]. Although [3, 7] also considered path delays and packet reordering, our implementation additionally took into account path-switching cost and mobility-related network overheads. The rate-distortion optimisation of schemes such as the Quality Layers method in [1] is shown here to be diluted by the effects of other factors considered in the path selection and scheduling algorithms.

One of the aims of multipath scheduling schemes is to limit the amount of packet reordering at the client by ensuring that packets arrive in the correct order for decoding. Schemes such as those proposed in [2, 3, 6] and that proposed in [7] all aimed to optimise the use of aggregated bandwidth while limiting packet reordering requirements. The authors of [18] highlighted the potential for out-of-sequence delivery in [7]. In a specific set of circumstances, some out-of-sequence packets may be delivered due to the operation of the ‘sliding window’ mechanism. While our implemented scheme uses the same sliding window mechanism as [7], it additionally considers the relatively high path-switching cost associated with multihomed mobile networks, a factor neither was considered by Jurca and Frossard [7] nor Tsai et al. [18]. Our technique provides a judicious trade-off between path-switching frequency and effective bandwidth aggregation. Therefore, by limiting the number of path switches and use of a mechanism to prevent an out-of-sequence delivery immediately after each path switch, we also significantly limit the number of out-of-sequence packets that may be delivered as a result of the issue identified in [18]. As can be seen from Table 2, the number of out-of-sequence packets delivered using our scheme is very low (< 0.4 %) and, given the SVC decoder’s buffer size, should not present any problem at the client. We have not investigated the cause of out-of-sequence delivery in our network, having previously assumed that it arose from retransmissions within the wireless part(s) of our testbed. Despite using orthogonal wireless channels in our testbed, we have identified that other wireless devices being used in the local area do occasionally have an adverse effect on wireless transmissions in our testbed.

Our comprehensive analysis of H264/SVC streaming within multihomed mobile networks, has also shown that there is still a significant gap in streaming performance over multiple paths when compared to an ideal single path (Fig. 20). We also observe that further study to minimise path-switching frequency may yield a more significant improvement in PSNR.

Fig. 20
figure 20

The remaining performance gap in streaming over multihomed mobile networks

Finally, it is noted that the H.264/SVC-specific prioritisation methods discussed have been gaining more momentum in most recent years in various emerging SVC application scenarios e.g., media-aware unequal error protection [10], fair multi-stream delivery [4], efficient HTTP streaming [15], and seamless mobile video access [13]. In our case, we proposed to employ the quality layers mechanism as the priority weighting for a selective packet dropping scheme for adaptive streaming in the resource-constrained multihomed mobile networking context.

4.3.3 Impact of base-layer rate control

In the event that the aggregated bandwidth of all available network paths falls below the bandwidth requirement of the base layer, no packets from any enhancement layer will be transmitted and some base-layer packets will be dropped to meet the available bandwidth. Dropping base-layer packets has a significant detrimental effect on the PSNR of the received stream. By applying rate control techniques to the base layer, the bandwidth requirement can be reduced in a rate-distortion optimised manner with target bit rates being specified that are significantly lower than the default encoder output. The resultant base layer can be transmitted in its entirety and achieves better PSNR results at the client than sending the default encoder base layer with packets dropped due to bandwidth limitations in the network. The results reported in this paper demonstrate that the employment of rate controlled, low target bandwidth base layers within an SVC stream provides a simple yet robust means of ensuring that the stream can be adapted to dynamic path changes in multihomed mobile networks, which lead to very low available aggregated bandwidth. By employing rate control in conjunction with the Quality Layers mechanism in our low bandwidth low bandwidth experiments, we have shown the rate control scheme to be compatible with and complimentary to the main Quality Layers prioritisation method used in this paper.

5 Conclusions

In this work, we have experimentally evaluated streaming mechanisms for H.264/SVC video in a realistic multihomed mobile networks setting. In mobile networks, where bandwidth scarcity is an inherent challenge to resource-demanding video applications, media-aware packet prioritisation schemes play a critical role in determining which packets should be scheduled at the right times and which packets should be dropped in response to changes in network path conditions. We have showed that both the Quality Layer based streaming scheme and the JSVM Basic Extraction method based streaming outperform a representative streaming scheme for multipath transmission in the literature. In scenarios of higher aggregated bandwidth (1.5Mbps or above), significant PSNR improvement of up to 1.5 dB can be achieved when employing the Quality Layers based scheme, which also outperforms the Basic Extraction based scheme by a small margin of up to 0.14 dB. A further investigation has also shown that the rate-distortion gains achieved by both the Quality Layers and the Basic Extraction schemes are reduced by the effects of the other (mostly network path switching related) factors that must be considered in multipath streaming algorithms. Furthermore, it has been demonstrated that applying base-layer rate control to SVC streams is a useful means of maintaining an acceptable quality of received video in lower-bandwidth mobile networking situations such as those encountered in public transport scenarios. The use of a rate-controlled base layer, with a correctly chosen target bit rate, can increase the PSNR of received stream by an average of 3.07 dB at an aggregated available bandwidth of 96Kbps when the Quality Layers based scheme is employed. Finally, we have also contributed an insightful discussion of the impacts of packet prioritisation methods and rate control on the quality of the received stream and identified some of the remaining challenges in this area for future research. In our future work, we will conduct further performance evaluation using visual quality metrics beyond PSNR and focus on the quality of experience.