An object-based error concealment technique for H.264 coded video

Yang, Shih-Hsuan; Chang, Chi-Wen; Chan, Chih-Chieh

doi:10.1007/s11042-014-2206-9

An object-based error concealment technique for H.264 coded video

Published: 31 July 2014

Volume 74, pages 10785–10800, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

An object-based error concealment technique for H.264 coded video

Download PDF

Shih-Hsuan Yang¹,
Chi-Wen Chang¹ &
Chih-Chieh Chan¹

239 Accesses
4 Citations
Explore all metrics

Abstract

The H.264 video coding standard has become popular owing to its excellent compression efficiency. However, the H.264-coded video is very vulnerable to data loss. Conventional error concealment techniques interpolate the lost data in units of rectangular blocks, which limit the performance because a visual object is not equivalent to an image block. In this paper, we propose a new error concealment technique that uses visual objects as concealment units. The H.264 error-resilience tool FMO (Flexible Macroblock Ordering) is also incorporated at the encoder side for utilizing the spatial correlation. A lost region is concealed at the decoder side in three steps, namely object segmentation, object matching, and region-based patching. Objects are formed in the reference pictures based on color similarity, and adjacent objects of small area or the same motion are grouped as a unity. Motion estimation is performed on detected objects to find the associated motion vector. A lost region is concealed by the object in the reference picture with the best boundary-matching score. The proposed method provides considerably higher PSNR (peak signal-to-noise ratio) than conventional block-based approaches, especially for traditionally difficult cases and high-quality videos.

Error concealment with parallelogram partitioning of the lost area

Article 23 December 2019

A Novel Error Concealment Method Based on Adaptive Ordering of Block Match

An effiicient full frame algorithm for object-based error concealment in 3D depth-based video

Article 05 September 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The H.264 standard, also known as MPEG-4 AVC (Advanced Video Coding) [1, 6, 7], has been widely employed in digital TV, mobile video, video streaming, and Blu-ray discs. H.264 produces video of high visual quality with much fewer bits, as compared to previous video-coding standards such as MPEG-2. The highly compressed H.264 video, however, is very sensitive to data loss because of the extensive use of predictive coding and variable-length codes. Various error protection mechanisms have been proposed to alleviate degradation of decoded video. For example, forward error correction or error detection with retransmission may be implemented in the network transport layer. Error resilience tools, such as inserting resynchronization markers, data partitioning, reversible variable-length coding, and insertion of intra-blocks or intra-frames, may be used at the encoder to confine the damage of data impairment. In conjunction with appropriate error concealment at the decoder, these error-resilience tools may improve the performance of the overall system. Error concealment relieves the visual degradation by interpolating the lost or erroneous samples from spatially or temporally correlated samples. Spatial error concealment estimates a pixel of a lost block as a weighted average of correctly received neighboring pixels. However, spatial error concealment suffers from blurring and artifacts. In contrast, temporal error concealment estimates the motion vector (MV) of a lost region from correlated blocks and restores the lost region by motion compensation.

The H.264 standard adopts frame-based coding to achieve high compression efficiency unlike the MPEG-4 Part 2 [4] that inherently adopts the object-based coding where the basic coding unit is an arbitrary-shape object. In the current JM (Joint Model) reference software of H.264 [7], the motion vectors surrounding the lost macroblock (MB) and the zero MV are collected as the candidate MVs. The missing MB is restored by the MV with the smallest boundary-matching cost. This error concealment method fails to give satisfactory performance when no reliable MV exists in the candidate set, or the matching criterion is inappropriate, or multiple objects with different motion coexist in a lost macroblock. Many possible improvements such as more accurate MV estimation [12], better interpolation algorithm [9], classification of motion regions [5], and finer block classifications [8, 3], have been proposed in the literature. All of these approaches notably use rectangular blocks as basic units for concealment, following the basic coding structure of H.264. Furthermore, no error-resilience techniques are explicitly employed in these methods.

The H.264 standard includes some new application-layer error-resilience tools. One such tool is the FMO (Flexible Macroblock Ordering), which reorganizes the image blocks in a prioritized manner for better exploiting spatial correlation or facilitating unequal error protection. In this paper, we activate FMO at the encoding stage for object matching. At the decoding side, the restriction on rectangular blocks is removed and concealment is performed on objects of arbitrary shape. Objects are first segmented in the reference frame based on color similarity. Neighboring objects of small area or with consistent motion are grouped as a whole. Motion estimation is then performed on the detected objects to find the object motion vector around a refined search range. The object motion vectors associated with a lost macroblock are collected in a candidate set along with the conventional block-based motion vectors. A lost region is concealed by the object that incurs the smallest error in boundary matching. Experimental results show that the proposed object-based method achieves superior concealment results in terms of objective PSNR evaluations.

The rest of this paper is organized as follows. In Section 2, the relevant error concealment algorithms in the literature are reviewed. The proposed object-based error concealment method is presented in Section 3. Experimental results and analyses are given in Section 4, followed by the conclusion.

2 Previous work

Temporal error concealment requires the true motion vector of objects for prefect restoration. However, it is generally difficult and complicated to realize object-based true-motion estimation. Part of the reason is that the needed information (pixels, MVs) may be deficient in data-loss situations. Block-based methods that independently match and patch a missing block are usually used instead. In the current H.264 Joint Model (JM) reference software, the MVs of top and left blocks of the current frame, the MV of the collocated block in the previous frame, and the zero MV, are collected as the MV candidates of a missing macroblock (MB) [7]. The MV with the smallest cost in BMA (Boundary Match Algorithm) is chosen as the MV for error concealment. BMA finds the sum of absolute differences between the boundary of a block (inside pixels) and the adjacent blocks (outside pixels), as shown in Fig. 1. Zhang et al. [12] modified the error concealment algorithm to include more vectors in the set of MV candidates and use EBMA (external BMA) as the distortion criterion. EBMA evaluates the distortion as the sum of absolute differences between the outside pixels of a candidate block in the reference frame and the successfully received outside pixels of the current frame. A hardware-efficient modification of [12] was proposed in [9], which saved considerable computation and memory bandwidth with slight visual degradation by reducing the number of MV candidates and reusing data and computational results. In [5], a motion characteristic differentiated error concealment method based on motion field transfer was proposed. Different concealing methods are applied to different regions according to their motion characteristics with the aid of FMO used at the encoder.

In the above techniques, MBs of size 16x16 luma pixels are taken as the units for concealment. Better error concealment results can be achieved if the restriction on the single-size block interpolation can be removed. A variable-block-size error concealment technique was proposed in [8]. The MV of a missing MB is first estimated using boundary matching. The 16x16 MB will be divided into 16x8 or 8x16 blocks if such division provides smaller side-match distortion. Further division into 8x8 or smaller blocks will be performed under similar conditions. The authors also introduced a spatial-temporal boundary matching algorithm to increase the temporal coherence. In [11], a variable-block-size error concealment technique based on the coding modes was proposed. The mode (SKIP or not) and block size (16×16, 16×8, 8×16, or 8×8) of the lost block to be concealed are determined from the coded modes of surrounding MBs. If the surrounding MBs are mostly of the SKIP mode or M types, the lost MB is assigned the same type, respectively. The lost block is concealed by the motion vector that incurs the smallest EBMA cost. In [10] a hybrid motion extrapolation (HMVE) algorithm was proposed. The algorithm is hybrid in the sense that motion estimation is performed in blocks but error concealment is individually done for each pixel. Pixels in a missing block are concealed by extrapolated blocks from reference pictures assuming a constant-speed motion model. HMVE classifies pixels to be concealed into three categories, as shown in Fig. 2. Category A ({1,2,3,4,5,6,7,8,11,12,13} in the Concealed Block 1 of Fig. 2) refers to pixels covered by at least one extrapolated 4×4 block. Category B ({9,10,14,15,16} in the Concealed Block 1 of Fig. 2) refers to pixels not covered by any of the extrapolated blocks. The other pixels where its resident block is isolated with extrapolated blocks belong to Category C (Concealed Block 2 in Fig. 2). For Categories A and B, the dominant MV (with the largest overlapping area) and the average MV (weighted by overlapping area) are included in the set of candidate MVs. Category A also incorporates the MVs of the other overlapped extrapolated blocks in the candidate set. To remove the outliers in Category A, the MVs that are distant to the others will be discarded. The final MV used for concealment is obtained by averaging the remaining valid MVs. For Category C, the MV of the collocated pixel in the previous frame is used. It should be noted that HMVE assumes that the video was coded in frames (i.e., one slice per frame) and thus the error-resilience tools such as FMO were not employed.

3 Proposed method

The above H.264 object-based error concealment techniques [8, 2] are conducted in full frames, assuming that a frame is completely received or totally lost when an error occurs. However, better concealment results are expected if partial spatial information of the current frame can be used. In this paper, we employ the FMO tool at the encoding stage and consequently the spatial information from successfully received slices can be used in object matching. The block diagram of the proposed object-based error concealment technique at the decoder is shown in Fig. 3. Unlike the previous methods, the restriction on rectangular blocks for motion estimation is removed. Three major stages, object segmentation, object matching, and region-based patching, are involved in the algorithm. We explain each of them in the following subsections.

3.1 Object segmentation

When packet loss occurs, object segmentation is performed on the whole reference frame according to the color and motion consistency of pixels during decoding. Initially, the luma components (namely Y, 256 grey levels of 8 bits) of a reference frame are uniformly quantized into 8 levels (by taking the first three most significant bits). The number of quantized levels, eight, is chosen for reaching a good result of noise reduction and complexity efficiency. The initial segmentation is found by grouping connected-components of the same color. As illustrated in Fig. 4a and b, the initial segmentation may contain many small objects. These tiny objects typically amount up to 90 % of the objects found by color quantization and connected components. We join those tiny or fragile objects according to the object size and motion consistency. First, a segmented object with no more than 10 pixels is merged to its neighboring object with the most similar grey level. It is also observed that a visual object (with the same motion) such as balls may contain several color segments. Therefore, neighboring objects with the same motion are grouped as a whole. After the above merging process, those fragile objects will be properly joined, as illustrated in Fig. 4c. The motion vector of a pixel (called the pixel MV) is found as the MV of its resident block, and the MV of an object (called the object MV) is calculated as the average of its constituent pixel MVs, which will be used in the next stage. Note that for H.264-coded video the MV of a macroblock has been generated by the encoder and is then transmitted to the receiver. Although these MVs (associated with successfully received macroblocks) are calculated based on rate-distortion optimization and thus do not always represent the true motion, they are used in this paper to estimate the pixel MV and object MV for avoiding complicated true motion estimation.

3.2 Object matching

In this stage, the best match (motion vector) between the object in the reference frame and a lost region in the current frame is derived. The object MV, denoted as (d _x,d _y), obtained in the reference frame by object segmentation is taken as the initial guess. To measure the difference between two objects, a region-based mean absolute error (MAE) is defined as follows

$$ \mathrm{MAE}\left( i, j\right)=\frac{{\displaystyle \sum_{m=0}^{M-1}{\displaystyle \sum_{n=0}^{N-1}\left|\mathrm{R}\left( p+ m, q+ n\right)-\mathrm{F}\left( p+ m+{d}_x+ i, q+ n+{d}_y+ j\right)\right|}}}{\mathrm{No}.\ \mathrm{of}\ \mathrm{valid}\ \mathrm{pixels}\ \mathrm{in}\ \mathrm{the}\ \mathrm{summation}} $$

(1)

In Eq. 1), F(x,y) and R(x,y) are the pixel’s quantized grey level at position (x,y) of the current frame and reference frame, respectively; (p,q) is upper-left position of the bounding box of the object, and M and N are the sizes of the bounding box. Assuming that the motion of an object is relatively stable without abrupt change, we constrain the final object position in the current frame to ±3 pixels predicted by the initial MV (i.e., −3 ≤ i, j ≤ 3 in Eq. (1)). If a matching pixel in the current frame is lost, this pixel is not counted as a valid pixel in Eq. (1). By evaluating all the MAEs in the search range, the position of an object in the current frame is found as follows

$$ \mathrm{top}-\mathrm{left}\ \mathrm{position}\ \mathrm{of}\ \mathrm{an}\ \mathrm{object} = \left( p+{d}_x, q+{d}_y\right)+\underset{-3\le i, j\le 3}{ \arg\ \min}\mathrm{MAE}\ \left( i, j\right) $$

(2)

These object positions will be recorded for use in the next stage.

3.3 Region-based patching

The process of the proposed object-based patching is shown in Fig. 5. We say that an object in the reference frame covers a region in the current frame if the extrapolated object obtained by object matching overlaps the region. First, the number of covering objects for a missing macroblock is counted. If one to four objects cover a missing macroblock, the proposed object-based patching will be used; otherwise the conventional block-based patching same as JM will be used. We limit the maximal number of covering objects to four because a large number of covering objects is usually the result of a lack of image features. Then for object-based patching, if a region is covered by more than one object (i.e., collision), the object with the smallest extended boundary matching score (EBMS) is used for concealment. The EBMS is calculated based on the outside bordering pixels of a block, as shown in Fig. 6. The object-based MV (obtained by object matching) will compete with the block-based MV (same as JM), and the one with a smaller EBMS will be selected as the final MV used for error concealment. For a hole region (pixels in a lost block without matching objects), the conventional block-based MV will be used.

4 Experimental results

We conducted experiments based on JM 16.2 with the settings listed in Table 1. To facilitate error resilience, videos are encoded with the Baseline profile (IPPP structure and one reference frame for P frames). The disperse FMO is activated with 6 slice groups per frame, as shown in Fig. 7. We activated the built-in fast motion estimation algorithm in the H.264 JM, namely the UMHexgonS algorithm (Unsymmetrical-cross Multi-Hexagon-grid Search) [2] to find the motion vectors of Inter MBs. Each slice is encapsulated in one packet and independently transmitted. Three packet loss rates (PLR) of 5, 10, or 15 % with independent packet losses are tested. Specifically, packet arrivals (success or loss) are regarded as independent random variables, and the loss probability is assigned as the specified packet loss rate. The random seed was determined by the current system time. All of the evaluated methods will assume the same loss pattern during a random experiment. The MV resolution for error concealment is 1/4 pixel.

Table 1 H.264 encoder settings in this paper

Full size table

The proposed method is first compared to the error concealment schemes implemented on JM 16.2 [7], and Ref. [10] (HMVE). In this experiment, one Intra frame is inserted for every 10 frames (Intra period = 10). Six standard CIF video sequences (frame rate equal to 30 Hz) of different visual characteristics,^{Footnote 1} Football, News, Mobile, Foreman, Paris, and Stefan, are tested, where the whole image sequence is used for simulation. Simulation is performed with three different QP (quantization parameter) values, 20, 28, and 36 that correspond to high-quality, medium-quality, and low-quality video, respectively. Table 2 lists the PSNR values of the three evaluated methods. Although the proposed object-based method differs from the method in JM only on blocks that contain multiple objects, it achieves notably better performance (0.40 to 1.41 dB gain averaged over three QPs) for the investigated video sequences. A larger PSNR gap is observed for higher-quality (smaller QP) and more complex video sequences (such as Mobile). Note that the proposed method reduces to the JM block-based full-frame method [7] when FMO and slice groups are not employed. For the comparison with HMVE, consistently better improvement is observed since the proposed method incorporates the spatial information into object-based error concealment. Note that the HMVE algorithm was implemented and tested under the slice-loss scenario with the same simulation conditions (with FMO). Simulation results on higher-resolution standard videos^{Footnote 2} (“Crew” 480p and “Ducks_take_off” 720p) are given in Table 3. The proposed method provides better PSNR performance in all the evaluated cases. Note that the performance of Ref. [10] becomes worse. It is conjectured that the candidate MVs obtained by motion extrapolation are less reliable for high-resolution cases, because as compared to the CIF cases fewer significant image features reside in a macroblock for high-resolution videos.

Table 2 PSNR comparison (in dBs) of the proposed method with JM16.2 [7] and Ref. [10] (Intra period = 10, the PSNR gain is averaged over PLR = 5, 10, 20 % and QP = 20, 28, 36)

Full size table

Table 3 PSNR comparison (in dBs) for high-definition videos (Intra period = 10, the PSNR gain is averaged over PLR = 5, 10, 15 % and QP = 20, 28, 36)

Full size table

Table 4 Differential PSNR comparison (in dBs) of the proposed method with Ref. [12] and Ref. [9] (QP = 28, Intra period = 10); ΔPSNR_Ref.[ 12 _] = PSNR_Ref.[ 12 _] − PSNR_JM10.2, ΔPSNR_Ref.[ 9 _] = PSNR_Ref.[ 9 _] − PSNR_JM10.2, ΔPSNR_proposed = PSNR_proposed − PSNR_JM16.2. Note that Ref. [12] and Ref. [9] was implemented on JM10.2 while the proposed method was implemented on JM16.2

Full size table

In Table 4, the proposed method is compared with Ref. [12] and Ref. [9] in terms of the differential PSNR relative to the error concealment method implemented on JM. The PSNR results are averaged over the same number of trials as that used in the reference methods. Recall that Ref. [12] has a broader set of MV candidates and better boundary matching algorithm (EBMA) than the error concealment scheme in JM, and Ref. [9] realizes a corresponding hardware-efficient implementation. It can be seen that the proposed method achieves reliably good results and outperforms Ref. [12] (and thus Ref. [9]) in most cases. More significant difference is observed on the Football and Stefan sequences that contain fast moving objects and are regarded more difficult for error concealment.

The effect of the Intra period has also been investigated, and the results for the Paris sequence with QP equal to 20 are shown in Table 5. Three different Intra periods (10, 30, and 240) are tested, which correspond to inserting one I frame every 1/3, 1, and 8 s, respectively. Due to error propagation, it can be seen that a long Intra period significantly worsens the performance of error concealment. Nevertheless, the proposed method provides consistently better results as compared to JM 16.2 and Ref. [10]. The PSNR difference becomes even bigger with a longer Intra period.

Table 5 The effects of Intra interval (PSNR for Paris, CIF, 1065 frames, 30 Hz, QP = 20)

Full size table

Subjective concealment results on single still frames of the proposed method, Ref. [7], and Ref. [10] are shown in Figs. 8, 9 and 10, which substantiate the superiority of the proposed method. For the Football sequence, better concealment results are observed on the player even though the body motion is fast and irregular. The distinctive image features of the football player provide good clues for object identification. The News sequence has a static MPEG-4 banner and slow-moving news reporters in the foreground, and fast-moving dancers and poles on the background screen. Both the reporter and the background screen are better concealed with the proposed method. The Mobile sequence has complex color distribution and object movement, and thus more objects are formed during the process of object segmentation. Distinguished concealment results are observed for the proposed method within and around the calendar. Also, generally more significant improvements are achieved by the proposed method for low QP values because correct object segmentation and motion vector estimation rely on the quality of received data.

The proposed method has higher computational complexity, required especially for object segmentation and object matching. For our current version of un-optimized code, the computational complexity of the proposed method is approximately 100 times of JM and 10 times of Ref. [10] for CIF sequences. The decoding frame rate of the proposed method for CIF sequences on the used PC platform is about 1 frame/s. So hardware acceleration is expected if the proposed method is considered for real-time applications.

5 Conclusions

A new object-based error concealment technique is proposed for H.264-coded video with FMO in this paper. The proposed method exploits both the spatial and temporal information of successfully received slices. The visual objects within a frame are identified in the reference frame based on the color (grey-level) and motion consistency. The motion vector of an object is refined within a small search range around the initial object motion vector with a modified boundary-matching algorithm. Concealment is then performed in objects and the issue of multiple correspondences is properly resolved. The proposed method is evaluated in various encoding and transmission conditions such as different test sequences, different QPs, different Intra periods, and different packet loss rates. The proposed object-based method provides better performance than the conventional block-based approaches when multiple objects co-exist within a block. Compared with the methods in the literature, the proposed method provides considerably better objective visual quality especially for traditionally difficult cases and high-quality videos. The major drawback of the proposed method is its high computational complexity. Hardware acceleration is required for the use of the proposed method in real-time applications.

Notes

References

Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264, or ISO/IEC 14496–10 Information technology—Coding of audio-visual objects - Part 10: Advanced Video Coding (MPEG-4 AVC), Fifth Edition, May 2009
Fast Integer Pel and Fractional Pel Motion Estimation for JVT (JVT-F017.doc), ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, 6th Meeting, Awaji Island, Japan, Dec. 2002
Hsiao H-H, Leou J-J (2013) Background initialization and foreground segmentation for bootstrapping video sequences. EURASIP J Image Video Proc 12
ISO/IEC 14496–2 information technology—coding of audio-visual objects, Part 2: Visual, Amendment 1: visual extensions, July 2000
Li H, Zhong Y (2013) Motion characteristic differentiated error concealment. Multimedia Tools Appl 65:297–320
Article Google Scholar
Richardson IE (2010) The H.264 advanced video compression standard, 2nd edn. John Wiley & Sons Ltd
Wang YK, Hannuksela MM, Varsa V, Hourunranta A, Gabbouj M (2002) The error concealment feature in the H.26 L test model. In: Proc. IEEE ICIP, 729–736
Wang L, Wang J, Goto S, Ikenaga T (2007) Variable block size error concealment scheme based on H.264/AVC non-normative decoder. International Symposium on Intelligent Signal Processing and Communication Systems, China
Wu G-L, Chen C-Y, Wu T-H, Chien S-Y (2010) Efficient spatial-temporal error concealment algorithm and hardware architecture design for H.264/AVC. IEEE Trans Circ Syst Vi Technol 20(11):1409–1422
Article Google Scholar
Yan B, Gharavi H (2010) A hybrid frame concealment algorithm for H.264/AVC. IEEE Trans Image Process 19(1):98–107
Article MathSciNet Google Scholar
Yang S-H, Tsai J-C (2010) A fast and efficient H.264 error concealment technique based on coding modes. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, Shanghai, China, March, pp. 1–4
Zhang J, Arnold JF, Frater MR (2000) A cell-loss concealment technique for MPEG-2 coded video. IEEE Trans Circ Syst Vi Technol 10(4):659–665
Article Google Scholar

Download references

Acknowledgments

This research is supported in part by the National Science Council, Taiwan, under the Grants NSC 101-2219-E-027-002 and NSC 102-2219-E-027-002.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taipei University of Technology, 1, Sec. 3, Zhong-Xiao E. Rd., Taipei, Taiwan, 10608
Shih-Hsuan Yang, Chi-Wen Chang & Chih-Chieh Chan

Authors

Shih-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Wen Chang
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Chieh Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shih-Hsuan Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, SH., Chang, CW. & Chan, CC. An object-based error concealment technique for H.264 coded video. Multimed Tools Appl 74, 10785–10800 (2015). https://doi.org/10.1007/s11042-014-2206-9

Download citation

Received: 29 January 2014
Revised: 08 July 2014
Accepted: 18 July 2014
Published: 31 July 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11042-014-2206-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An object-based error concealment technique for H.264 coded video

Abstract

Similar content being viewed by others

Error concealment with parallelogram partitioning of the lost area

A Novel Error Concealment Method Based on Adaptive Ordering of Block Match

An effiicient full frame algorithm for object-based error concealment in 3D depth-based video

1 Introduction

2 Previous work