1 Introduction

During the second half of 2020, SBTVD (in Portuguese: Sistema Brasileiro de Televisão Digital – Brazilian Digital Television System) Forum – the entity responsible for elaborating the standards of the DTTB (Digital Terrestrial Television Broadcasting) system used in Brazil – released the CfP (Call for Proposals) [1] with the requirements of the next Brazilian DTTB system generation (named TV 3.0), opened to all interested international consortiums and standardization bodies to propose their solutions to the requirements.

During the first half of 2021, SBTVD Forum defined the test procedures to evaluate the candidate solutions, and throughout the second half of 2021, TV 3.0 Project Test Labs verified proposed solutions [2].

Different modern features composed the requirements of the CfP, such as C/N (carrier to noise ratio) ≤ 0 dB, 2x2 MIMO (Multiple Input Multiple Output), channel bonding, IP (Internet Protocol) based transport layer for OTA (Over-The-Air), and OTT (Over-The-Top), vital Broadband and Broadcast Integration, UHD (Ultra-High Definition) image, such as 4K and 8K, HFR (High Frame Rate), and Scalable Video [1].

Test labs analyzed and evaluated the candidate proposals, as result ROUTE (Real-time Object Delivery over Unidirectional Transport) and DASH (Dynamic Adaptive Streaming over HTTP - Hypertext Transfer Protocol) were adopted as the transport layer solution for OTA/OTT transmission, respectively [3] since they are the most mature technology among the candidates and ROUTE-DASH protocol latency has been decreasing as this technology gets more mature.

VVC (Versatile Video Coding) standard was adopted as a video coding solution for OTA/OTT layers since it was the candidate that better met all the requirements, even being a recently standardized technology [4, 5] and not commercially mature yet. HEVC (High-Efficiency Video Coding) standard [6, 7] was adopted as a video solution for OTT [3].

Considering the video coding requirements present in the TV 3.0 CfP [1], an important one is video scalability. It allows the same video content to be distributed through different means of communication, with different bitrate, to receivers with different processing capabilities [7].

Classically, three classical types of video scalability are defined in the literature: spatial, temporal, and quality scalabilities [7]. Spatial scalability allows the offer of different spatial resolution contents, from the same original video to receivers with different features/capabilities. Temporal scalability allows the extraction of video contents with different temporal resolutions from the same complete video. Quality scalability allows the extraction of video contents with different quality levels from the same default video [7].

The use of the video scalability feature allows a DTTB station to broadcast the main service in lower resolution program as BL (Base Layer) to all receivers and transmit complementary content as EL (Enhancement Layer) to receivers capable of decoding both BL and EL. Video scalability can provide advantages to TV 3.0, such as distinct video program resolutions that can be offered to different receivers. A different user experience can be achieved since different video resolutions can be offered. Higher image resolutions can be transmitted using lower bitrates than broadcasting the same video content in various resolutions at the same time (simulcasting) [8].

Another important fact in the current Brazilian scenario is broadband growth. According to Anatel (in Portuguese: Agência Nacional de Telecomunicações – Brazilian Telecommunications Agency) [9], fixed Internet broadband connections have been increasing yearly in Brazil, mainly in higher speed connections. The average growth in the last ten years is around 10% per year. Currently, Brazil has ~43M fixed broadband subscriptions, where the average Internet connection speed is 220 Mbps, ~90% of the HHs (HouseHold) have access to the Internet, ~84% of them use fixed broadband connection [10], and the perspective is to keep the growth rate in the next years.

Video scalability and hybrid reception have been studied by ETRI (Electronics and Telecommunications Research Institute) Labs [11,12,13].

In [11], it is used HPHT (High Power High Tower) approach to broadcast LDM (Layered Division Multiplexing) ATSC (Advanced Video Standard Committee) 3.0 signal with SHVC (Scalable High-Efficiency Video Codec) video. LDM's upper layer has video BL (720p@60), and LDM's lower layer transports video EL (2160p@60). ROUTE-DASH are used as transport layers for OTA and OTT, respectively. TV OTA signal is used for the primary consumption of a service, and a broadband network can be used to deliver alternative streams whenever broadcast signals are not available (shadow regions).

The authors of [12] propose 8K video service transmission, however backward compatible with 4K legacy receivers. UHD 4K (2160p@30) content is transmitted as SHVC BL (Scalable High-Efficiency Video Codec – Base Layer), using MMT (MPEG - Moving Picture Experts Group - Media Transport) or ROUTE-DASH as transport layer via ATSC 3.0 signal, whereas UHD 8K (4320@30p) is transmitted as SHVC EL (Scalable High-Efficiency Video Codec – Enhancement Layer), using DASH as transport layer via broadband network.

The authors of [13] propose using video scalability to offer the final user immersive media (3D). Using LDM ATSC 3.0 feature, the OTA signal broadcasts 2 PLPs (Physical Layer Pipes). PLP1 carries SHVC BL, which contains HD left-view video. PLP2 carries SHVC EL, containing right-view UHD video. According to MPD (Media Presentation Description) file coordinates, the receiver is capable of compiling and offering a 3D image to the user.

Considering the previous scenario, as a contribution to TV 3.0, this paper proposes the use of video scalability and hybrid transmission to deliver UHD 4K video to the final user in order to enhance video quality and user experience. The proposed system uses the ROUTE-DASH method [14] for transport layer and the ATSC 3.0 physical layer [15]. 2K video BL is OTA broadcasted, and 4K EL is OTT transmitted.

This research has some advantages in relation to the related ones, such as evaluating the latency between BL and EL caused by Internet CDN (Content Delivery Network) transport of EL. Therefore, suggesting a minimum latency between BL and EL that TV 3.0 receiver should support, besides proposing receiver’s architecture, where it is possible to extract BL A/V (Base Layer Audio/Video) segments from RF (Radio Frequency) signal, storing them into a local server, where other receivers connected to the internal network can access, besides combining with EL.

This paper details the proposal of video scalability usage to improve the TV 3.0 system and offer an enhanced experience to the final viewer, the lab environment used to develop and validate this proposal, tools developed, tests executed, results achieved and is organized as follows. Section 2 describes the leading DTV technologies used in this research, such as the IP-based transport layer, types of video scalability, and the current Brazilian fixed Broadband connections scenario. Section 3 details the proposal and advantages of using scalable video in TV 3.0 and the lab environment. Section 4 describes the SW (SoftWare) tools developed, and Section 5 reports the test executed and respective results. Section 6 concludes this paper.

2 DTV system technologies

Different world DTTB systems updated their specifications in order to accompany the technological advances and incorporate them, for example. DVB-T (Digital Video Broadcasting - Terrestrial) [16] system was updated to DVB-T2 [17], and the ATSC system was updated to ATSC 3.0 [15].

In this context, the SBTVD system [18] would not follow a different path, incorporating the most modern A/V coding technologies, hybrid MW (MiddleWare) [19, 20], IP based transport layer, among others. Therefore, the main DTTB technologies used in this research (video scalability, transport layer, and fixed broadband deployment in Brazil) are described below.

2.1 Video scalability principles

Scalable video coding provides a mechanism for coding video in multiple layers, where each layer depicts a different quality representation of the same video scene [7]. A scalable video bitstream is arranged in different layers, including a BL and one or more EL [21,22,23]. The BL offers the lowest video quality. One or more ELs may be coded by referencing lower layers and providing enhanced video quality. Decoding a subset of layers of a scalable video bitstream results in video with a lower but still acceptable quality than would result if the full bitstream were decoded [7]. Fig. 1 illustrates two layers of scalable video, where decoder 1 receives only the BL, decoding video with basic quality. Whereas decoder 2 receives both video layers, decoding high-quality video scenes [21,22,23].

Fig. 1
figure 1

General Scalable Video Concepts

A scalable, two-layer video stream can be obtained through a layered video encoder, where the BL is encoded only information that represents a scaled-down version of the original content (either in frame rate, resolution, or quality). The EL is encoded and added to the BL to improve video quality. This general concept is illustrated in Fig. 2. The BL is obtained by subsampling at pre-processor the high-quality video, then encoding it with the BL. The EL is obtained by sampling the reconstructed BL (BL1), subtracting it from the high-quality video, and encoding the result as a normal bitstream.

Fig. 2
figure 2

Generical Two Layers Scalable Video Encoder

Different types of scalabilities can be found in the video coding area. Classically, three types of video scalability are used: spatial, temporal, and quality scalabilities [7]. These types of video scalability are required for TV 3.0 [1] and are better explored in the following items.

2.1.1 Spatial scalability

Spatial scalability provides the opportunity to reduce the spatial resolution of the video [7, 23]. It allows extracting bitstreams with different spatial resolutions from the complete video bitstream. This technique permits to offer of distinct spatial resolution contents from the same content to terminals with different features [8].

Considering a two layers scalable video, the BL carries the main image, encoded with low resolution, while the EL carries information for full-resolution image reconstruction. This concept is depicted in Fig. 3. The bitstream is divided into two layers of spatial scalability. By decoding the BL - Fig. 3 a, the device obtains a version of the original video with the lowest spatial resolution possible (low resolution). Decoding the EL allows more information to be obtained - Fig. 3 b, which is added to the first layer, resulting in a reconstructed video with the original spatial resolution (high resolution).

Fig. 3
figure 3

Spatial Scalability - (a) Low Resolution - BL (b) High Resolution - EL.

2.1.2 Temporal scalability

Temporal scalability provides the opportunity to reduce the frame rate of the video [7, 23]. It allows extracting bitstreams corresponding to different temporal resolutions from the complete video bitstream. This technique allows excellent visual quality at the maximum temporal resolution and maintains acceptable visual quality at lower temporal resolutions [8].

In the case of two layers scalable video, the BL of a time-scaling sequence is encoded with LFR (Low Frame Rate), and the EL carries data to increase the video frame rate. This approach is shown in Fig. 4, where the video is encoded with two scalable temporal layers. The BL transports the LFR video - Fig. 4a, and EL decoding allows increasing video to HFR - Fig. 4b.

Fig. 4
figure 4

Temporal Scalability - (a) Low Frame Rate - BL (b) High Frame Rate - EL.

2.1.3 Quality scalability

Quality scalability provides the opportunity to reduce the detail or fidelity of the video [7, 23]. It allows extracting bitstream corresponding to different levels of quality from the full bitstream. This type of scalability is also known as quality scalability since the decoding error is related to the perceptual quality of the image. Hence, there is an increase in the video image quality without varying the spatial and temporal resolutions [8].

This concept is illustrated in Fig. 5, which is formed by two layers of scalable video. The BL carries the most important coefficients of DCT (Discrete Cosine Transform), allowing the reconstruction of a basic image with low SNR (Signal to Noise Ratio) resolution [22,23,24,25] Fig. 5a, while the EL carries the additional coefficients, which allows for improving video image quality Fig. 5 b.

Fig. 5
figure 5

Quality Scalability – (a) Low Quality - BL (b) High Quality - EL

The previously described three types of video scalability are supported by H.265 [6, 7, 26] and H.266 [4, 5] video standards. Moreover, video scalability and hybrid reception have been adopted by different DTTB systems, such as ATSC 3.0 [14] and DVB-T2 (Digital Video Broadcasting Terrestrial – 2nd Generation) [16], however such feature it is not currently deployed.

2.2 ROUTE-DASH transport layer

An important innovation of the MPEG-2 TS (Transport Stream) container format was transported packet usage with a structure of 188 bytes, which allowed the distribution of multiple media streams in a common stream. Its operation is based on fundamental assumptions, such as constant delay across the delivery link and packet flow encryption. Individual packet streams typically carry one or more related media streams. Streams can be encrypted in a base package to provide conditional access. Moreover, streaming services became increasingly popular as broadband connections became widely available.

Constant delay assumption in MPEG-2 TS delivery resulted in the need to use a buffer when used with RTP (Real Time Protocol) to deliver streaming content. Buffering and variability in available bandwidth often result in an unsatisfactory user experience. Such a scenario led to the development of adaptive technologies known as HTTP adaptive streaming, which essentially delivers all IP streaming content via broadband. Different HTTP adaptive streaming technologies are converging on DASH [27].

The MPEG-2 TS PCR (Program Clock Reference) method depends on the delay assumption. The wide availability of UTC (Coordinated Universal Time) on the Internet allows DASH to eliminate the dependence on PCR delivery, offering benefits concerning synchronous playback of media from multiple sources simultaneously. Moreover, the convergence between the services provided by DTTB and IP streaming systems led to different use cases, as described below:

  • Broadcast Streaming: All components are delivered via broadcast; Nevertheless, the user experience can be enhanced through a browser-based interface, which allows interactive applications to run in the same streaming environment.

  • Broadband Streaming: All service components are delivered via a Broadband connection.

  • Hybrid Services: Class of service, where a single service can be composed of both broadcast and Broadband. These hybrid service components are, for example, secondary audio, enhanced video content, text, or other targeted service components.

These converging scenarios of broadcast and broadband content transmission suggest the need for a unified receiver stack that can handle the full diversity of possible service types and delivery methods. As depicted in Fig. 6, this unified receiver stack is organized to allow a clean interface among the various layers and the functionality required among the various delivery methods. It depicts the equivalence of the protocol stacks, where the ROUTE-DASH protocol [14] is used (on the left side) with the traditional protocol TCP (Transmission Control Protocol)/IP stack architecture applied to the adaptive service, which uses the DASH protocol (on the right side).

Fig. 6
figure 6

Unified Broadcast and Broadband Receiver Stack

The ROUTE-DASH protocol has a level equivalent to the HTTP application transport protocol, connecting the down-stack to UDP (User Datagram Protocol) transport layer and the up-stack to the DASH segments. Additionally, it provides generic application transport for any object and supports presentations, including scene descriptions, media objects, etc. The ROUTE-DASH protocol is suitable for delivering real-time media content, offering different features [28].

  • Individual delivery and access to different media components.

  • Support for layered media encoder, allowing delivery of such encoded service in different LCT (Layered Coding Transport) transport sessions.

  • Easy combination with DASH protocol allows synergy between broadcasting and Broadband connection.

  • Fast media access when joining a ROUTE-DASH session.

  • Enables enhanced reuse of existing media format technologies, namely DASH and ISO (International Organization for Standardization) BMFF (Base Media File Format).

2.3 Fixed broadband connections scenario in Brazil

Parallelly to the current Brazilian DTTB deployment, according to Anatel, the number of fixed broadband Internet subscriptions has been strongly growing in Brazil over the last years [9]. Fig. 7 illustrates the growth rate of fixed broadband subscriptions in Brazil per Internet speed in the last ten years. Total subscriptions grew from approximately 20 million in 2013 to over 40 million in 2022. Moreover, it is possible to see that the plans with connection speeds higher than 34 Mbps have been experiencing explosive growth since 2018, being the most subscribed access speed from 2019. Currently, around 87% of Internet subscriptions have a speed higher than 34 Mbps. At the same time, other plans with lower connection speed, up to 34 Mbps, up to 12 Mbps, up to 2 Mbps, and up to 512 Kbps are on a route decline since 2019.

Fig. 7
figure 7

Total Fixed Broadband Internet Subscriptions per Connection Speed

Besides the Internet connections and the average speed growth in Brazil, it is important to highlight the quantity of HHs, which has fixed broadband exceeded the ones, which has mobile broadband in December/2021. In December/2019, 77.9% of the HHs had fixed broadband connection, whereas 81.2% had mobile broadband connection. In December /2021, 83.5% of the HHs has fixed broadband connection, whereas 72.2% had mobile broadband connection [10].

Furthermore, TV set device is currently more used than PC (personal computer) to access the Internet. In December/2021, 45.1% of Brazilian population used the TV set to access the Internet, whereas the PC was used by 41.9% of the population. In December/2019, Brazil had an opposite scenario, where 46.2% of the population used PC to access the Internet, whereas the TV set was used by 32.2% of the population for this access [10].

3 Proposal

In order to contribute to the studies and advances toward TV 3.0 (the new Brazilian DTTB system), this paper proposes the use of a scalable video streaming feature to enhance OTA BL video resolution from 2K to 4K, using EL OTT transmitted, where video contents synchronization is executed through ROUTE-DASH protocol. Moreover, a receiver’s architecture is proposed to extract BL from the RF signal and store it in an internal receiver’s server, from where other receivers hosted inside the local network can play it, as well as combine with EL to improve the user’s experience.

Using scalable video feature, video content with different image quality can be transmitted to different types of receivers, with better encoding and transport efficiency if compared to independent contents transmission in simulcasting mode.

3.1 Possibilities to transmit video scalability

In the DTTB world, commonly, TV broadcasters transmit the BL via the main OTA RF channel. To complete program content transition and enable scalable video feature to DTV receiver. Currently, there are four main ways to transmit the EL, as described below [8].

3.1.1 MIMO (Multiple Input Multiple Output)

The technique is used in mobile communications systems and was adopted by ATSC 3.0 [29] allowing the DTTB signal to be OTA broadcasted using a cross-polarization feature (e.g., linear or slant polarization). It is used to increase the overall system capacity via additional spatial diversity and multiplexing by transmitting two data streams in a single radio frequency channel, allowing the transmission of BL and EL contents.

Although this feature is mentioned in the TV 3.0 requirement [1], the receiver would need to use a special RF tuner capable of receiving two independent RF polarization simultaneously from the MIMO antenna.

3.1.2 Channel bonding

This technique is widely used in mobile communications systems to improve data rates. DTTB systems allow the BL to be broadcasted using the main RF channel and allocate a secondary RF channel (adjacent or not to the main channel) to transmit the EL, such as its use in ATSC 3.0 system [30].

Although this feature is mentioned in the TV 3.0 requirements [1], to take advantage of it, the receiver would need to use two RF tuners capable of receiving two independent RF signals at the same time in different frequencies.

3.1.3 LDM (Layered Division Multiplexing)

LDM broadcasting technique is based on the simultaneous transmission of different synchronized signals (in time and frequency), which are broadcasted on the same RF channel at the same time. Desired signal selection is executed according to the signal power level. The BL of the scalable video can be transmitted in the upper layer, and video scalability EL can be transmitted using the lower layer. This technique is used by ATSC 3.0 system [31].

To take advantage of this feature, the receiver needs to be capable of demultiplexing the LDM signal. Nevertheless, this feature is not part of the TV 3.0 requirements [1].

3.1.4 Internet

Considering the worldwide adoption and usage of the Internet, it has become a vital resource to be considered to transmit EL video signals. Traditionally, BL can be OTA broadcasted, and EL can be transmitted to the receiver via broadband Internet, using a hybrid reception feature.

Moreover, in a more advanced and challenging scenario, BL can also be transmitted via broadband Internet and the EL. Besides this feature being part of TV 3.0 requirements [1], the receiver would need only a standard RF tuner, capable of receiving one RF channel and a broadband Internet interface, commonly available nowadays in smart TVs.

Among the four ways previously described to transmit scalable video features, this is the most suitable one to contribute to the advancement of the new Brazilian DTTB system since it is the simplest of them to implement and due to the growth of fixed broadband Internet access in Brazil, as described in Section 2.3.

3.2 Experimental environment

This research used ATSC 3.0 system for RF physical layer because it is currently the newest commercial DTTB system, besides using ROUTE-DASH as transport layer protocol [14]. HEVC/SHVC video coding standard was used due to its commercial maturity [32]. However, this research's video scalability usage proposal would not change if another video standard were used, such as H.266 [4]. Fig. 8 depicts the logical block diagram of the experimental environment used in this work, showing the ATSC 3.0 signal transmission chain from video encoding, A/V multiplexing, and data delivery to the exciter and signal reception.

Fig. 8
figure 8

BL + EL Transmission Through Different Means - Block Diagram

4K UHD source video content [33] is delivered via multicast UDP to the real-time video encoder [34], which operates in ATSC 3.0 system. The video encoder delivers the BL + EL contents to the multiplexer through the WebDAV (Web Distributed Authoring and Versioning) protocol.

The multiplexer equipment [33] is internally composed of two modules: the packager and the multiplexer itself. The packager converts WebDAV data into UDP multicast format and delivers it to the multiplexer itself, which adds signaling information, and delivers the BL data using ROUTE-DASH protocol as a transport layer to the ATSC 3.0 scheduler and the EL content to the CDN.

In the following sections, it will be described how the BL and EL separation is performed.

Continuing in the OTA transmission chain, the scheduler equipment receives the BL in UDP multicast format from the multiplexer, adding modulation parameters and delivering the content to the modulator [35] to transmit it according to the ATSC 3.0 specifications of the receiver.

The EL content is transmitted via streaming, using the DASH protocol, to an external CDN in the OTT chain. In this work, the CDN has been implemented using AWS (Amazon Web Services) services [36] and then delivered to the receiver via DASH streaming.

A NTP (Network Time Protocol) server was connected to all lab equipment to provide time synchronism among them.

Fig. 9 illustrates the receiver’s architecture propose in this research. The DTTB receiver is a desktop station running Windows 10 operating system (Core 5 and 16 GB RAM – Random Access Memory) with an ATSC 3.0 tuner. It runs an internal DASH server besides using a GPAC (GPAC Project on Advanced Content) video player [37].

Fig. 9
figure 9

Proposed Receiver’s Architecture

The receiver gets the RF signal from the air, extracts the BL, and stores the video segments in the internal receiver's server. GPAC plays the BL content from the internal server. Furthermore, it receives the EL content from CDN and synchronizes both video layers, executing video enhancement from 2K to 4K.

OTA and OTT contents extraction will be further detailed in the following sections.

The Lab equipment used in this paper is listed below:

  • DigiCAP Stream Generator - PCAP (Packet Capture) Player [33].

  • Ateme Titan Real Time Video Encoder [34].

  • DigiCaster-Media – Multiplexer [33].

  • Pro Television – Modulator [35].

  • GPAC – Video Player [37].

  • ATSC 3.0 Receiver.

  • Internet Speed: Download: 500 Mbps. Upload: 50Mbps.

  • NTP Server.

Table 1 shows the video features used for the BL and EL video segments, while Fig. 10 demonstrates the physical connections for the test setup.

Table 1 Video features for bl and el
Fig. 10
figure 10

Test Set up Physical Connections

3.3 Advantages to TV 3.0

Video scalability feature usage can offer several different advantages to the next Brazilian terrestrial DTV generation system and receiver, as follows [8]:

  1. 1.

    The same video content can be delivered with different quality to different capabilities of DTV receivers. A lower capability receiver can decode a video resolution of 1080i@30 or 1080p@60, while a more sophisticated receiver can decode the same program with a higher resolution, such as 2160p@60.

  2. 2.

    The improved content quality offered via EL leads to an improved user experience concerning base content. Higher spatial resolution shows better image details and increases the image deepness sensation. A higher frame rate is especially interesting for sportive content or action movies since it decreases the image blur. Moreover, an expanded color gamut allows the image to have color reproduction closer to the real world.

  3. 3.

    Bandwidth reduction if compared to simulcasting. The base program is demanded in both situations. However, it is not necessary to independently broadcast the same program simultaneously: one with improved resolution and the other with lower resolution. Thus, using the higher bandwidth. In this case, besides the BL broadcast, it is necessary only to transmit EL to see the program with improved image quality, thus avoiding simulcasting, network congestion, and reducing storage sizes [38].

  4. 4.

    If it is considered a region where only a few RF channels are available, it is still possible to deliver enhanced quality program content. The basic program can be OTA broadcasted as BL to all receivers, and the EL layer can be OTT transmitted without needing additional RF channels.

  5. 5.

    The BL can be encoded with improved robustness in relation to the EL. In this case, the received image is subject to smaller degradation if compared to a non-scalable video signal [7].

  6. 6.

    Since TV signals can be transmitted via broadband technology, video scalability provides more transmission flexibility, allowing bitrate adaptation during transmission either in the broadcaster's server or the network.

  7. 7.

    Considering 6 MHz bandwidth used by the Brazilian RF channels, frequency reuse-1 (C/N ≤ 0 dB) requirement, and TV 3.0 tests executed in this condition, the maximum bitrate achieved was 3.6 Mbps in the MIMO condition for the Advanced ISDB-T proposal [39]. Such bitrate allows broadcasting 1080p@60 pre-coded content. To broadcast 2160p@60, it would be necessary for about 8 Mbps [1, 2]. The EL transmission via OTT allows for delivering an enhanced user experience even considering such restrictive TV 3.0 requirements.

3.4 Advantages in relation to other implementations

The work developed in [11] takes advantage of the ATSC 3.0 system HPHT structure and the ability to execute hybrid transmission using the ROUTE-DASH protocol. The broadband transmission is targeted to cover broadcast shadows in the signal delivered to the DTV receiver. It used SHVC video codec to produce BL (720p) and EL (2160p) contents. Initially, when the RF signal is high enough, BL and EL are OTA broadcasted using the LDM feature. As the receiver moves, the RF signal decreases, allowing the receiver to extract the BL only.

During the movement, it is found in regions where relatively low signal strength is detected, and 4G (4th Generation) LTE (Long Term Evolution) signal is available, the receiver decodes BL from broadcast because it is not possible to OTA receive EL due to low signal strength and requests EL via OTT. Finally, in regions where the broadcast signal is not available, the receiver switches to a Broadband network to decode BL.

The study proposes an implementation to execute seamless switching between broadcast and broadband networks according to broadcast SNR and RSSI (Received Signal Strength Indicator) parameters registered during receiver movement, and according to 4G LTE signal availability.

As the demand for UHD 8K content increases, 4K UHD TV sets can become obsolete quicker. Therefore, reference [12] proposes an alternative to extend the 4K TV set lifespan. The SHVC codec is used to produce BL (2160p) and EL (4320p) contents. BL is broadcasted using ATSC 3.0 RF signal with MMT or ROUTE-DASH as the transport layer and combined in the receiver with EL, which arrives via OTT using DASH protocol.

Considering both works, the latency between BL and EL in the different scenarios was not shown, neither the minimum latency that receiver should support to synchronize both layers properly.

Therefore, this research has some further advantages considering references [11, 12]:

  • Proposal of a receiver’s architecture, where it is possible to extract BL A/V segments from RF signal and store them into an internal local server, where another receiver connected to the internal network can access them, besides combining with EL (Section 3.2).

  • Evaluation of the latency between BL and EL caused by Internet CDN transport of EL in different scenarios.

  • The suggestion of the minimum latency between BL and EL that TV 3.0 receiver needs to support to correctly reproduce enhanced video contents based on the test scenarios described in the following sections.

Furthermore, the video scalability delivery using hybrid transmission proposed in this research has the following contributions compared to the other related ones:

  • This work compensates that the TV 3.0 candidate technologies were not tested considering hybrid transmission requirements (TL (Transport Layer Requirement) 1 and 2) [1].

  • This paper's transport layer protocol (ROUTE- DASH) successfully deliver OTA/OTT video contents (BL/EL).

  • The results of this paper meet TV 3.0 video scalability and hybrid transmission requirements (such as VC (Video Coding Requirement) 9, VC 10, VC 11, TL 1, and TL 2) [1].

    • VC9 requirement enables seamless decoding and A/V alignment.

    • VC10 enables interoperability with different distribution platforms (hybrid transmission).

    • VC11 enables video scalability.

  • The built experiments were able to deliver UHD content to the final user, taking advantage of the fixed Internet infrastructure available in Brazil and fulfilling the TV 3.0 frequency reuse-1 (C/N ≤ 0 dB) requirement (PL (Physical Layer Requirement) 2) [1].

  • The results show that it is possible to reduce the TV receiver's cost since it is demanded only a Broadband Internet interface and a standard DTV RF tuner to receive UHD scalable video.

4 Developed SW Tools

To allow BL and EL separation at multiplexer level, two SW tools have been developed in the context of this paper: A ROUTE-DASH analyzer and an ISO BMFF A/V segment extractor.

4.1 ROUTE-DASH Analyzer

In order to develop an ISO BMFF A/V segments extractor, it was necessary to understand how ISO BMFF A/V segments are fragmented, signalized, and transported by ROUTE-DASH protocol in the UDP multicast streaming. Therefore, it has been developed according to the ATSC 3.0 ROUTE-DASH standard [14] and uploaded to GitHub as a ROUTE-DASH analyzer [40]. It was written in Lua language [41] to be used by the open-source Wireshark analyzer [42]. UDP transport datagram carries LCT packets [43], which the ROUTE-DASH protocol uses to transport A/V segments and XML (Extensible Markup Language) files.

The Lua dissector identifies the protocol used to transport the desired packet. It identifies the respective contents and fields, such as LCT version number, congestion control flag, CCI (Congestion Control Information), TSI (Transport Session Identifier), TOI (Transport Object Identifier), payload, etc. Thus, allowing us to identify the type of data transported. If it is A/V or signalization (XML) only packet, how information is fragmented, where a certain A/V content starts and ends, how many packets are used to carry the desired content, and if the video segment belongs to BL or EL.

4.2 ISO BMFF A/V segments extractor

Along with the support of the ROUTE-DASH analyzer, it was developed in Linux as a C++ routine, which receives UDP multicast streaming from the packager, extracts the EL video segments, naming them according to the original filenames given by the real-time video encoder, delivers the BL video segments to the OTA transmission chain and uploading the EL video segments to the Internet CDN.

After extracting video segments, a binary file comparator was used to compare extracted video segments with the original ones to check respective consistency. The routine properly extracted all BL and EL video segments.

4.3 MPD File

Besides extracting A/V segments from UDP multicast streaming, the ISO BMFF A/V extractor replaces the manifest.mpd file initially generated by the real-time video encoder by a new one, which allows the hybrid OTA/OTT transmission/reception. Thus, the DTV receiver can download the EL from the CDN and combine and synchronize it with OTA BL content for proper preproduction.

The MPD (Media Presentation Description) is a document that contains metadata required by a DASH client to construct appropriate HTTP-URLs to access the A/V segments, synchronizes them, and provides the streaming service to the user [44]. Fig. 11 illustrates the proposed MPD file used in this work. BL video (v100) is OTA broadcasted, whereas EL video (v101) is OTT transmitted.

Fig. 11
figure 11

MPD File Proposed

5 Tests and results

The latency between the BL and EL is a system challenge. CDN should offer low latency to deliver EL to allow for a good user experience [45]. Moreover, DTV receiver needs to receive both contents, stores, and synchronizes to play them properly.

Considering BL and EL video segment A, time latency between them can be calculated as Eq. (1):

$${\mathrm{T}}_{\mathrm{Latency}}={\mathrm{T}}_{\mathrm{BL}}-{\mathrm{T}}_{\mathrm{EL}}$$
(1)

Where, TBL the moment where full BL video segment A is available at the receiver. TEL the moment where full EL video segment A is available at the receiver. TLatency is the difference between these moments.

Some tests were proposed and executed to measure latency between the BL and EL during their transmission and check the receiver’s behavior in different conditions.

These tests and respective results are presented in the following sections.

5.1 Tests Executed

The tests below were proposed to evaluate this work considering the latency between the BL and EL introduced by CDN during EL transmission:

  1. 1.

    Receiver to play the BL and EL contents stored at the internal receiver’s server.

  2. 2.

    The receiver is to play the BL content stored at the internal receiver's server and the EL content already uploaded to the CDN.

  3. 3.

    Receiver to extract the BL content from the OTA signal, combine it with the EL content already uploaded to the CDN, and play them.

  4. 4.

    Receiver to play the BL content stored at the internal receiver’s server, and the EL content uploaded in real-time to the CDN.

  5. 5.

    Receiver to extract the BL content from the OTA signal and combine it with the EL content uploaded in real-time to the CDN.

Besides measuring latency introduced between the BL and EL by the different test scenarios, respective test purposes are set as the following:

  1. 1.

    Evaluate if the receiver properly synchronizes BL and EL without suffering CDN transport effects.

  2. 2.

    Evaluate if the receiver properly synchronizes BL and EL with small latency added between both layers.

  3. 3.

    Evaluate if the receiver properly extracts BL from the OTA signal and synchronizes both layers, considering the latency between them.

  4. 4.

    Evaluate if the receiver properly synchronizes BL and EL, considering the latency added between both layers during the EL upload process.

  5. 5.

    Evaluate the receiver operation in the user's reception scenario.

5.2 Test Results

The receiver proposed in this paper managed to properly receive ROUTE-DASH contents, extract BL and EL contents from UDP multicast streaming, synchronize them and reproduce enhanced UHD video content in all the proposed tests.

Different latency values between BL and EL were measured in each test according to the GPAC player logs [46, 47]. As expected, EL is delayed in relation to BL due to the different forms of transmission used to transport it [48]. While the Fig. 12 is the graph, which illustrates how latency between the BL and EL evolves from Tests 1 to 5, the Table 2 shows the average latency between both layers for the five tests with and respective standard error.

Fig. 12
figure 12

Latency Evolution of the EL for the Test 1 to 5

Table 2 Average latency measured between bl and el

As expected, Test 1 showed the smallest latency value among all tests executed since both layer contents were already stored in the internal receiver's server.

In Test 2, latency between both layers increased to Test 1. In this test, EL stored in the CDN needed to be downloaded to be combined with BL and stored in the receiver's server. Latency between BL and EL decreased a little bit in Test 3 in relation to the previous one. In this case, receiver must first receive and extract BL content from OTA signal then requesting EL content, previously available at CDN, to combine both layers.

In Test 4, latency between both layers significatively increased, as expected, since added a higher degree of complexity to the set-up: CDN upload/download time. BL was already stored and the receiver's server, whereas EL was uploaded to CDN in real-time during video transmission, then downloaded by the receiver. Latency between both video layers was affected by EL upload/download speed, Internet traffic, distance between the TV receiver and CDN, and by general CDN infrastructure. Nevertheless, the latency introduced by CDN does not strongly affect the user's experience.

The latency between BL and EL decreased in Test 5 in relation to Test 4. The reasons are similar than the ones mentioned in Test 3: receiver must first receive and extract BL content from OTA signal then requesting EL content. Since EL upload speed to CDN is high, it has not affected very much latency between both layers.

6 Conclusions

This paper proposed a contribution to the TV 3.0 system by the mean of the use of scalable video to offer UHD 4K content to the final user and enhance the user's experience. The BL content was OTA broadcasted using the ROUTE-DASH protocol as the transport layer, whereas the EL content was OTT transmitted via CDN using MPEG-DASH protocol as the transport layer. The new Brazilian DTTB generation system has already adopted both.

The hybrid DTTB receiver proposed in this research considered TV 3.0 requirements to receive the BL and EL contents and combine them to offer UHD video according to TV 3.0 requirements and take advantage of the current Brazilian fixed Internet broadband scenario.

AWS CDN was used to reproduce a Broadcaster’s CDN able to deliver contents all over Brazilian territory, and common Internet access was used in order to reproduce the user's environment.

Different tools, such as ROUTE-DASH analyzer and video segments extractor, were developed to make it possible to separate the BL and EL contents from UDP multicast streaming, where RF signal broadcasts the BL and OTT transmits EL via CDN.

The latency measured between the BL and EL contents was under 1500 ms in all tests executed, being a very appropriated value to offer a good user’s experience. In [45], AWS recommends the use of a low latency system (2 s ≤ video latency ≤ 6 s) for video distribution.

The proposed receiver correctly downloaded the BL and EL contents and combined them to play the UHD image properly.

According to the results achieved in this research, TV 3.0 receivers with support for scalable video and hybrid reception are recommended to support at least 2000 ms between the BL and EL video contents to face network instabilities and offer a good user experience.

This research measured latency between the BL and EL contents in a Lab condition considering a best-case scenario: DTTB receiver with high RAM, high download/upload Internet speed, very good CDN, etc.

In future works, it is possible to explore this research using more restrictive scenarios, such as lower network speed, higher package drop, decreased overall network QoS, and evaluate its influence on latency between both layers and the quality of the video reproduction combining both layers.

Another test scenario would be to purposefully impose higher latency between BL and EL to check the test receiver's behavior and possible impacts on the user's experience.

Finally, there would be the possibility of interfaces (software/hardware) development for temporary use of current transmission and reception equipment. It would gather transmission/reception information to better control BL/EL time transmission and try to decrease latency between both video layers at the receiver.